zheminzhou / PEPPAN

Phylogeny Enhanded Prediction of PAN-genome
https://doi.org/10.1101/2020.01.03.894154
GNU General Public License v3.0
38 stars 10 forks source link

"Error: CDS has no name." #16

Closed lalalagartija closed 4 years ago

lalalagartija commented 4 years ago

Hi, I get this error "Error: CDS has no name." and `multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/usr/local/lib/python3.7/dist-packages/PEPPAN/PEPPAN.py", line 153, in iter_readGFF assert len(name) > 0, logger('Error: CDS has no name. {0}'.format(line)) AssertionError: None """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/PEPPAN", line 10, in 2020-06-29 18:44:50.803920 Error: CDS has no name. 8NC_024451.1 Armadillidium vulgare iridescent virus complete genome GeneMark.hmm CDS 1 4458 206.605609 + 0 gene_id=1

2020-06-29 18:44:50.804753 Error: CDS has no name. 8NC_001659.2 African swine fever virus strain BA71V, complete genome GeneMark.hmm CDS 2204 3286 33.678321 - 0 gene_id=1

2020-06-29 18:44:50.804778 Error: CDS has no name. 8NC_008724.1 Acanthocystis turfacea Chlorella virus 1, complete genome GeneMark.hmm CDS 1 1134 21.204768 - 0 gene_id=1

2020-06-29 18:44:50.804820 Error: CDS has no name. 8NC_005832.1 Ambystoma tigrinum virus, complete genome GeneMark.hmm CDS 70 981 19.933493 - 0 gene_id=1

2020-06-29 18:44:50.805565 Error: CDS has no name. 8NC_023848.1 Anopheles minimus irodovirus isolate AMIV, complete genome GeneMark.hmm CDS 187 747 4.665016 + 0 gene_id=2

sys.exit(ortho())

File "/usr/local/lib/python3.7/dist-packages/PEPPAN/PEPPAN.py", line 1815, in ortho genomes, genes = readGFF(params['GFFs'], params['feature'], params['gtable']) File "/usr/local/lib/python3.7/dist-packages/PEPPAN/PEPPAN.py", line 188, in readGFF for ss, cc in pool.imap_unordered(iter_readGFF, [[fn, feature, gtable] for fn in fnames]) : File "/usr/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value AssertionError: None 2020-06-29 18:44:50.853379 Error: CDS has no name. 8JN885990.1 Megavirus courdo7 isolate Mv13-c7, partial genome GeneMark.hmm CDS 2 166 13.518822 - 0 gene_id=1

peppan_line.sh: line 2: -t: command not found ` this is how my gff looks like :

gff-version

source-version

date:

8NC_020104.1 Acanthamoeba polyphaga moumouvirus, complete genome GeneMark.hmm CDS 721 927 9.876024 - 0 gene_id=1 9NC_020104.1 Acanthamoeba polyphaga moumouvirus, complete genome GeneMark.hmm CDS 1526 2029 14.812729 + 0 gene_id=2 10NC_020104.1 Acanthamoeba polyphaga moumouvirus, complete genome GeneMark.hmm CDS 2790 3554 40.878727 - 0 gene_id=3

What is the problem ?

Thank you very much in advance,

zheminzhou commented 4 years ago

PEPPAN requires either of "locus_tag", "name" or "ID" in the last column to assign every CDS a name. The GFF you are using has "gene_id=1" but no any of the keys that PEPPAN can recognize. Replace "gene_id" to be "ID" or "name" for PEPPAN to work.

However, "1" is too trivial to be a formal name, I suggest you to add a prefix.

lalalagartija commented 4 years ago

Thank you for your answer ! I still get the same message for all my gffs but they look like this now :

gff-version 2

source-version GeneMark.hmm_PROKARYOTIC 3.36

date: Mon Jun 29 17:05:18 2020

Sequence file name: ../../Nucl/Tunisvirus_fontaine.fasta

Model file name: GeneMark_hmm_combined.mod

RBS: true

Model information: GeneMarkS_virus

KF483846.1 Tunisvirus fontaine2 strain U484, complete genome GeneMark.hmm CDS 305 796 15.632905 + 0 name=Tunis_fontaineV2 KF483846.1 Tunisvirus fontaine2 strain U484, complete genome GeneMark.hmm CDS 818 1615 23.278046 - 0 name=Tunis_fontaineV3 KF483846.1 Tunisvirus fontaine2 strain U484, complete genome GeneMark.hmm CDS 1650 1853 7.205357 + 0 name=Tunis_fontaineV4