[BUG] <WARNING Final process status is permanentFail>

ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline

Other

316 stars 88 forks source link

[BUG] <WARNING Final process status is permanentFail> #187

Closed arkruem closed 2 years ago

arkruem commented 2 years ago

Hello PGAP developers, Describe the bug I used the test data and PGAP performed as expected, but when using my own data I get the error: WARNING Final process status is permanentFail

To Reproduce I was trying to annotate a known reference of C.botulinum, and yes I can share the genome. See input and submol used (in txt just for reporting the issue) input.yaml.txt Hall_submol.yaml.txt

command used : ./pgap.py -D singularity -r -o Hall_r Annotate/A1Hall/input.yaml

Expected behavior successfully annotate the genome

Software versions (please complete the following information):

OS: CentOS Linux 7
pgap.py --version: PGAP version 2022-02-10.build5872 is up to date
Docker (or other container runner) version. Using singularity-ce version 3.8.0

Log Files cwltool.log

azat-badretdin commented 2 years ago

Thank you for you report! Could you please post the first line of A1_Hall_009698.fasta? Thanks!

arkruem commented 2 years ago

Thank you for your reply Azat, Here is the first line >gi|153934468|ref|NC_009698.1| Clostridium botulinum A str. Hall chromosome, complete genome

azat-badretdin commented 2 years ago

Thanks. Could you please try to use a simpler SEQID?


>gi|153934468  Clostridium botulinum A str. Hall chromosome, complete genome

arkruem commented 2 years ago

Still getting the permanentFail

see log:
cwltool.log

azat-badretdin commented 2 years ago

I see. But the error in the Stack is different now: "Attempt to access NULL pointer"

Looks like my advice was ill-advised. I reviewed https://github.com/ncbi/pgap/wiki/Input-Files and it looks like we do not allow such SEQIDs. See:

Each sequence in the file must have a definition line beginning with '>' and a unique identifier (SeqID), eg >contig001 or >contig002. The SeqIDs must:

Be less than 50 characters long
Only include letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#).
Be unique within a genome

arkruem commented 2 years ago

Thanks Azat! I removed the non accepted characters and it worked successfully!!!

azat-badretdin commented 2 years ago

You are very welcome!