ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

[BUG] WARNING Final process status is permanentFail #286

Closed snackens closed 5 months ago

snackens commented 5 months ago

Thank you for this convenient tool. I got following error while running pgap.

Describe the bug WARNING Final process status is permanentFail.

To Reproduce Other genomes seemed to be successfully finished. Only this genome was failed. tmp-outdir.zip

command used in this analysis ./pgap.py -r -o pgap/result -g genome.fasta -s 'Helicobacter pylori' --taxcheck -D /opt/pkg/singularity-ce/4.0.0/bin/singularity (slightly modified for example name of fasta file and output directory.)

Expected behavior A clear and concise description of what you expected to happen.

Software versions (please complete the following information):

Log Files Please rerun pgap.py with the --debug flag and attach an archive (e.g. zip or tarball) of the logs in the directory: debug/tmp-outdir/*/*.log.

Additional context Add any other context about the problem here.

Best regards,

azat-badretdin commented 5 months ago

Thank you for posting this report, user @snackens !

Could you please post cwltool.log file as well? Thanks!

snackens commented 5 months ago

Thank you for your reply. Here is the file. cwltool.log

Thanks,

azat-badretdin commented 5 months ago

The key to reading cwltool.log is last part before first permanentFail message. Here it goes:

[2024-02-13 16:15:27] INFO [job screen_evaluate] /pgap/output/debug/tmp-outdir/0444mft1$ screen_evaluate \
    -ifmt \
    seq-annot \
    -tab \
    /pgap/output/debug/tmpdir/5zc2p07y/stgcc008c4b-0885-40a4-bed6-6939a95226dc/calls.tab
[2024-02-13 16:15:27] DEBUG Could not collect memory usage, job ended before monitoring began.
[2024-02-13 16:15:27] WARNING [job screen_evaluate] exited with status: 1
[2024-02-13 16:15:27] WARNING [job screen_evaluate] completed permanentFail

This means that your input is contaminated. The calls.tab file should be in your output/ directory, as well as in one of the output files in tmp zip you posted.. It will show what contaminates nucleotide sequences and where.

Please let me know how it goes.

snackens commented 5 months ago

Thank you!

calls.tab file includes

lcl|genome_030 X - PHG:phiX174 contam_in_prok

This means genome_030 is contaminated something? So Should I remove this contigs? or not use this genome? Sorry if this question is not related to this program.

azat-badretdin commented 5 months ago

That is up to you. If you curious about annotation, the most straightforward way is to just remove this contig.

This is also the way to submit the results to GenBank, if I am not mistaken.

snackens commented 5 months ago

I see.

Thank you for your kindness.

azat-badretdin commented 5 months ago

I was happy to help, user @snackens ! Can we close this issue?

snackens commented 5 months ago

Yes, thank you so much.

azat-badretdin commented 5 months ago

You are welcome!