ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

Get_Proteins_app issues #303

Closed asd1864714 closed 1 month ago

asd1864714 commented 2 months ago

./pgap.py -n -o out_directory -g sequence.fasta -s Staphylococcus

cwltool.log

I don't know what the problem is anymore, the last version worked fine

azat-badretdin commented 2 months ago

Thank you for your report, user @asd1864714

Could you please provide sequence.fasta?

The error in bacterial_prot_src

Stack trace (most recent call last):
#9    Object "", at 0xffffffffffffffff, in 
#8    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x40812d, in _start
#7    Object "/usr/lib64/libc-2.28.so", at 0x7f25ff460d84, in __libc_start_main
#6    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x407f89, in main
#5    Source "/export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 1024, in AppMain [0x7f2600f827ac]
#4    Source "/export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 711, in x_TryMain [0x7f2600f7f0d2]
#3    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x40ef41, in CBacterialProtSrcApp::Run()
#2    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x40b970, in CBacterialProtSrcApp::GetProtInfInputSQLite(int, ncbi::CUnicollDumpAccess&, std::__cxx11::list<SDatabase_input, std::allocator<SDatabase_input> >&)
#1    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x409319, in 
#0    Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2024-04-27.build7426/arch/x86_64/bin/bacterial_prot_src", at 0x4128a4, in void ncbi::CDiagBuffer::Put<int>(ncbi::CNcbiDiag const&, int const&)
Segmentation fault (Address not mapped to object [0xfffffffffffffffc])

could you please provide the system characteristics as well? They are usually an output on top of cwltool.log but not this time.

Thanks.

asd1864714 commented 2 months ago

I just downloaded a sequence at NCBI as a test, this is my system version: Ubuntu 20.04.4 LTS sequence.zip

azat-badretdin commented 2 months ago

Thanks. Ubuntu has been a tough nut for us historically, although we seem to be handling it better recently.

But let me first try to reproduce this locally

azat-badretdin commented 2 months ago

I am getting closer to locating the source of error here.

Meanwhile, I recommend to use, as a workaround, the species name instead of just genus.

asd1864714 commented 2 months ago

Ok thanks, I've tried using species names before but that didn't work either

azat-badretdin commented 2 months ago

It should work for this species:


$ sqlite3 taxonomy.sqlite3 " select count(*) from taxidinfo where taxid=3051183"
-- Loading resources from /home/badrazat/.sqliterc
count(*)
1

it is present in our taxonomy

asd1864714 commented 2 months ago

Yes, it is Staphylococcus hsinchuensis. Sorry, I am not proficient in coding. I tried reinstalling docker, but the same problem still occurred. I will try to install it on the CentOS machine later.

azat-badretdin commented 2 months ago

This species name was renamed after we prepared release on 4/28 (see the release version). You might have better luck with the old recognized name "Staphylococcus sp. H164"

azat-badretdin commented 2 months ago

You might have better luck with the old recognized name "Staphylococcus sp. H164"

I verified locally that this solution works

asd1864714 commented 2 months ago

Thanks for your patient answer, it's working fine now

azat-badretdin commented 2 months ago

Glad to hear that! We will keep the Issue open. The issue of -s genus not working remains.

asd1864714 commented 2 months ago

Hello, I found that adding “sp.” after the genus can run normally. It may help you find the problem. For example: ./pgap.py -n -o out_directory -g sequence.fasta -s "Frankia" is changed to ./pgap.py -n -o out_directory -g sequence.fasta -s "Frankia sp."

azat-badretdin commented 2 months ago

Thank you for your help, user @asd1864714 ! We appreciate your input very much.

Yes, we located the code that caused this regression of functionality and we are working on fixing it soon.

azat-badretdin commented 1 month ago

The code is fixed and the fix will be available as part of next release