ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
316 stars 88 forks source link

WARNING Final process status is permanentFail but PGAP runs fine on test data #258

Closed bpeacock44 closed 1 year ago

bpeacock44 commented 1 year ago

Hello,

I have been trying to run PGAP on an assembly file. PGAP runs the test data without any issues. The final error I get in the cwltool.log is: WARNING Final process status is permanentFail

Running on Linux (22.04.2-Ubuntu, x86_64 GNU/Linux)

It's not clear to me what the issue is arising from. I tried running it with and without the cpus option as suggested elsewhere

e.g. ./pgap.py -r -o strep_pgap_annotation contigs.yaml ./pgap.py -r —cpus 19 -o strep_pgap_annotation contigs.yaml ./pgap.py -r -o strep_pgap_annotation -g contigs.fna -s ‘Streptomyces’ ./pgap.py -r -c 19 -o strep_pgap_annotation -g contigs.fna -s ‘Streptomyces’

All produce the same result.

I've attached the contigs.yaml and submol.yaml files I use (I had to add "txt" to the end to upload) and the cwltool.log. Please let me know if there is more information I can provide. Thank you!!

contigs.yaml.txt submol.yaml.txt cwltool.log

azat-badretdin commented 1 year ago

Thank you for your report, Beth!

You had a right idea to look for permanentFail signal in cwltool.log.

The relevant part preceding the first message of this kind (cwltool posts it after every single Russian Doll workflow that fails) 👍

[2023-05-25 16:50:24] INFO [job screen_evaluate] /tmp/ppfm9qbc$ screen_evaluate \
    -ifmt \
    seq-annot \
    -tab \
    /tmp/dfitdnmy/stgc5b0c534-4fbe-44cc-870e-14252a26a128/calls.tab
[2023-05-25 16:50:25] DEBUG Could not collect memory usage, job ended before monitoring began.
[2023-05-25 16:50:25] WARNING [job screen_evaluate] exited with status: 1
[2023-05-25 16:50:25] WARNING [job screen_evaluate] completed permanentFail

You should be able to find file called calls.tab with non-empty locations and target sequence specifications indicating to some contamination of your genome by adapter or vector sequences.

Hope this helps!

Congratulations with being the first user reporting an issue for our new May release!

-- Azat

bpeacock44 commented 1 year ago

I appreciate your quick response - thank you so much!

azat-badretdin commented 1 year ago

You are welcome, Beth!

bpeacock44 commented 1 year ago

Hi Azat,

I'm not sure if I should start a new issue or just reply here - please forgive me if the former. I removed problematic contigs, as there weren't very many, and tried running pgap again. It ran into another issue, and I'm not sure how to troubleshoot. It seems like at least one of my sequences couldn't be validated:

terminate called after throwing an instance of 'ncbi::CException' what(): NCBI C++ Exception: Error: BACTERIAL_PIPELINE(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/app/bacterial_pipeline/prepare_seq_entry_in put.cpp", line 783: CPrepareSeqEntryInputApp::Run() --- bacterial pipeline input validator failed at least in one input seq-entry Stack trace: /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x41328d /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:702 ncbi::CNcbiApplicationAPI::x_TryMain(ncbi::EAppDiagStream, char const, int, bool) offset=0x0 addr=0x7f7a6df4a822 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:1014 ncbi::CNcbiApplicationAPI::AppMain(int, char const const, char const const, ncbi::EAppDiagStream, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) offset=0x0 addr=0x7f7a6df4ddec /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x40a389 /usr/lib64/libc-2.17.so :0 offset=0x0 addr=0x7f7a6b276554 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x40a579 :0 offset=0x0 addr=0xffffffffffffffff

Stack trace (most recent call last):

12 Object "", at 0xffffffffffffffff, in

11 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x40a579, in _start

10 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b276554, in __libc_start_main

9 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x40a389, in main

8 Source "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 1014, in AppMain [0x7f7a6df4ddec]

7 Source "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 702, in x_TryMain [0x7f7a6df4a822]

6 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x4132c5, in CPrepareSeqEntryInputApp::Run()

5 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc", line 93, in __cxa_throw [0x7f7a6bdf8222]

4 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 57, in terminate [0x7f7a6bdf7fe0]

3 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 47, in __cxa_begin_catch [0x7f7a6bdf7f95]

2 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc", line 95, in __verbose_terminate_handler [0x7f7a6bdfa1a4]

1 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b28ba77, in abort

0 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b28a387, in raise

Aborted (Signal sent by tkill() 279 1000) [2023-05-26 14:43:19] INFO [job Prepare_Seq_entries] Max memory used: 103MiB [2023-05-26 14:43:19] WARNING [job Prepare_Seq_entries] was terminated by signal: SIGABRT [2023-05-26 14:43:21] WARNING [job Prepare_Seq_entries] completed permanentFail

On Fri May 26, 2023, 08:29 AM GMT, Azat Badretdin @.***> wrote:

You are welcome, Beth! — Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/258#issuecomment-1564009355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWZYN6CZFEPERS3NC46GF3XIBSWFANCNFSM6AAAAAAYPEFDV4. You are receiving this because you authored the thread.Message ID: @.***>

azat-badretdin commented 1 year ago

Your assessment is correct, indeed, that's what the log says here:

bacterial pipeline input validator failed at least in one input seq-entry

is there anything else in the output above the snippet your posted up to command line execution echo?

bpeacock44 commented 1 year ago

Ah - I see - my genome is too big. Sorry for the trouble!!

Genome size 9.68661e+08 exceeds maximum genome size 3e+07

-Beth

On Fri May 26, 2023, 03:06 PM GMT, Beth Peacock @.***> wrote:

Hi Azat,

I'm not sure if I should start a new issue or just reply here - please forgive me if the former. I removed problematic contigs, as there weren't very many, and tried running pgap again. It ran into another issue, and I'm not sure how to troubleshoot. It seems like at least one of my sequences couldn't be validated:

terminate called after throwing an instance of 'ncbi::CException' what(): NCBI C++ Exception: Error: BACTERIAL_PIPELINE(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/app/bacterial_pipeline/prepare_seq_entry_in put.cpp", line 783: CPrepareSeqEntryInputApp::Run() --- bacterial pipeline input validator failed at least in one input seq-entry Stack trace: /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x41328d /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:702 ncbi::CNcbiApplicationAPI::x_TryMain(ncbi::EAppDiagStream, char const, int, bool) offset=0x0 addr=0x7f7a6df4a822 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:1014 ncbi::CNcbiApplicationAPI::AppMain(int, char const const, char const const, ncbi::EAppDiagStream, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) offset=0x0 addr=0x7f7a6df4ddec /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x40a389 /usr/lib64/libc-2.17.so :0 offset=0x0 addr=0x7f7a6b276554 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input :0 offset=0x0 addr=0x40a579 :0 offset=0x0 addr=0xffffffffffffffff

Stack trace (most recent call last):

12 Object "", at 0xffffffffffffffff, in

11 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x40a579, in _start

10 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b276554, in __libc_start_main

9 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x40a389, in main

8 Source "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 1014, in AppMain [0x7f7a6df4ddec]

7 Source "/export/home/gpipe/TeamCity/Agent1/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 702, in x_TryMain [0x7f7a6df4a822]

6 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-05-17.build6771/arch/x86_64/bin/prepare_seq_entry_input", at 0x4132c5, in CPrepareSeqEntryInputApp::Run()

5 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc", line 93, in __cxa_throw [0x7f7a6bdf8222]

4 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 57, in terminate [0x7f7a6bdf7fe0]

3 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 47, in __cxa_begin_catch [0x7f7a6bdf7f95]

2 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc", line 95, in __verbose_terminate_handler [0x7f7a6bdfa1a4]

1 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b28ba77, in abort

0 Object "/usr/lib64/libc-2.17.so", at 0x7f7a6b28a387, in raise

Aborted (Signal sent by tkill() 279 1000) [2023-05-26 14:43:19] INFO [job Prepare_Seq_entries] Max memory used: 103MiB [2023-05-26 14:43:19] WARNING [job Prepare_Seq_entries] was terminated by signal: SIGABRT [2023-05-26 14:43:21] WARNING [job Prepare_Seq_entries] completed permanentFail

On Fri May 26, 2023, 08:29 AM GMT, Azat Badretdin @.***> wrote:

You are welcome, Beth! — Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/258#issuecomment-1564009355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWZYN6CZFEPERS3NC46GF3XIBSWFANCNFSM6AAAAAAYPEFDV4. You are receiving this because you authored the thread.Message ID: @.***>

azat-badretdin commented 1 year ago

No problem, Beth. That's what we are here for - help.