ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

Test data works correctly, but Personal data not functioning properly #210

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi Team, I have installed all the denpendencies( such as PGAP version 2022-04-14.build6021) and tested the given data. It is ok to run test data as given command line: ./pgap.py -r -o mg37_results $HOME/.pgap/test_genomes/MG37/input.yaml

But something went wrong when I tried to use my own assembly draft genome which included 107 contigs:

contigZB100000 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] contigZB100001 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] contigZB100002 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] contigZB100003 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] contigZB100004 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] contigZB100005 [organism=Xanthomonas sacchari] [isolate=JT8-3-1] ......

and created input.yaml as follows: image submol.yaml as follows: image then ./pgap.py -r -o JT8-3-1 input.yaml

The error log is as follows:

cwltool.log calls.txt

I don't know how to deal with this problem, can you provide some advice? Thank you very much!

azat-badretdin commented 2 years ago

Thank you for your report, user @lkj66666

And especially, for posting the file calls.txt

This file contains hits from some of your contigs to a specific type of contamination of prokaryotic genomes - adaptors.

You can either ignore this error by specifying -ignore-all-errors on your command line, or edit or eliminate the contigs.

Second column of the file you posted shows the recommended action: X - means that we recommend to get rid of this contig in its entirety and M means that we advise to remove the indicated portion of the sequence.

Please let me know if this helps,

Cheers.

ghost commented 2 years ago

Hi @azat-badretdin , Thank s for your advice. I found the whole genome sequence of a bacterium and then run with or without--ignore-all-errors. But it still failed. the log file as follows:

./pgap.py -r -o HR1-32 input.yaml cwltool.log calls.tab.log

================================================ ./pgap.py --ignore-all-errors -r -o HR1-32 input.yaml cwltool.log calls.tab.log

And my genome: HR1-32.txt

Can you help me figure out what the problem is? Thanks very much

azat-badretdin commented 2 years ago

Sure. Here is what the log says this time:


[2022-06-15 02:22:36] INFO [job Prepare_Unannotated_Sequences] /tmp/z51q6xpo$ bacterial_prepare_unannotated \
    -submission-mode-genbank \
    -nogenbank \
    -asn-cache \
    /tmp/jlvq1vpr/stg39df68ae-4e18-4c4f-a915-b56f9222a399/sequence_cache \
    -gc-assembly \
    /tmp/jlvq1vpr/stgaffb909c-012e-4b6a-ba61-32a8a46672d0/gencoll.asn \
    -ids \
    /tmp/jlvq1vpr/stg4ea7099e-1500-4ae8-ac8b-1d1f769dbab2/all.gi \
    -master-desc \
    master-desc.asn \
    -plasmids \
    plasmids.seqids \
    -o \
    sequences.asn \
    -submit-block \
    /tmp/jlvq1vpr/stg8421c241-4f5b-483c-807e-1cb1d96164a4/submit_block_template.asn \
    -taxon-db \
    /tmp/jlvq1vpr/stg184e9782-4eb1-4fac-bf04-f4f7f6747fee/taxonomy.sqlite3
Reading assembly
  found 1 ids
Reading the submit block information
Processing sequences...
  processing lcl|contig1
Error: source descriptor does not have the genetic code: lcl|contig1

We will have a look at your sequence in a new internal ticket, since you have given us your FASTA sequence....

azat-badretdin commented 2 years ago

While we are narrowing down to the cause of the problem, it seems that the workaround here is the removal of [isolate] modifier in your FASTA file. I tried it locally, it works.

As for the ultimate cause, we will continue investigating.

ocnoscr commented 2 years ago

Describe the bug Could you help me. I installed all dependencies (such as PGAP version 2022-04-14.build6021) and the test data ran fine. The following command does not generate any problems, with the basic data (topology and organism) in the submol.yaml file:

~/pgap.py --ignore-all-error -r -o L23_annot L23.input.yaml

But, adding the personal data (contact_info, authors, bioproject, biosample and sra) in the submol.yaml file does not execute correctly.

Software versions (please complete the following information):

Log Files cwltool.log

Thanks very much

azat-badretdin commented 2 years ago

@ocnoscr

You have this:


ncbi::CObjectIStream::ExpectedMember() --- line 1: member department expected ( at JsonValue.contact_info)

Please try to fix it in input submol.yaml and if it fails please open a new issue. Thanks!

ocnoscr commented 2 years ago

@azat-badretdin

Thank you very much for the help, that was the problem.

Cheers