ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

Unable to complete annotation #196

Closed AerdnaIttualoc closed 2 years ago

AerdnaIttualoc commented 2 years ago

Hi,

I am trying to annotate prokaryotic genomes via PGAP, installed following this tutorial https://github.com/ncbi/pgap/wiki/Quick-Start

testing with the command works, so there should be no installation problems ./pgap.py -r -o mg37_results test_genomes/MG37/input.yaml

However by running the command ./pgap.py -r path/to/Strain.yaml -o path/to/output

the terminal reports "WARNING Final process status is permanentFail"

this is my .log file cwltool.log

what am I doing wrong?....

Thanks

azat-badretdin commented 2 years ago

Thank you for your report, user AerdnaIttualoc!

The key to analysis of cwltool.log is finding first node that failed with message permanentFail and then looking for an output files with the names mentioned in report part that precedes the message permanentFail

For example, in your case, the first reference to permanentFail comes at the job fastaval which lists the command line that points to the output file named fastaval.xml.

Unfortunately, in this case, cwltool.log indicates that this output file was most likely deleted as transient.

I suggest to rerun your input adding --debug command line flag, then locate the file by running find . -name fastaval.xml and examine the contents. It is usually self-descriptive.

AerdnaIttualoc commented 2 years ago

Thanks for your help, now this issue is fixed, however when i try to upload the results to the submission portal, it reports this validation error: ERROR: valid [SEQ_INST.CompleteGenomeHasGaps] Title contains 'complete genome' but sequence has gaps BIOSEQ: lcl|NODE_1_length_1485511_cov_20.184210: delta, dna len= 1485511

Where and how can I set that the genome I'm annotating is not "complete genome" and solve this error?

Thanks!

azat-badretdin commented 2 years ago

Please try to add [tech=wgs] to all your FASTA headers and run again through PGAP?

AerdnaIttualoc commented 2 years ago

Is there a tutorial reporting these information besides the github wiki? https://github.com/ncbi/pgap/wiki/Input-Files

therefore this parameter must be modified in the fasta and not in the metadata.yaml?

azat-badretdin commented 2 years ago

Is there a tutorial reporting these information

I believe that there is somewhere a reference on adding these modifiers to FASTA headers. I haven't used it for a while, so now I discovered that it is moved somewhere else. I asked somebody else for help.

therefore this parameter must be modified in the fasta and not in the metadata.yaml?

That is true. You can't set this file in metadata right now, I believe.

azat-badretdin commented 2 years ago

I asked somebody else for help.

Karen Clark from GenBank team helped and here is an NCBI reference resource describing different modifiers to FASTA headers: https://www.ncbi.nlm.nih.gov/genbank/mods_fastadefline/

AerdnaIttualoc commented 2 years ago

Thanks for the precious help, even that of Karen who has kindly replied to my emails on several occasions

One last question...... If I recive Internal N's chunk (size:100) error from fastaval.xlm output what should I do? cwltool.log

<message tool="fastaval" severity="WARNING" seq_id="NODE_2_length_437575_cov_23.224847" code="SEQ_INTERNAL_N" fasta_seq_id="lcl|NODE_2_length_437575_cov_23.224847">Internal N's chunk (size:100) from 483</message>

azat-badretdin commented 2 years ago

If I recive Internal N's chunk (size:100) error from fastaval.xlm output what should I do?

You can break the contig into two, or drop the contig, or drop the small part of it, or simply edit it.

You can also try to run it with --ignore-all-errors option which will get you through the fastaval guard dog, but the subsequent events are unpredictable.