Open ramadatta opened 4 years ago
Thanks for the report! It's strange - the merR CDS does not have a protein ID.
It's because it's a pseudo-gene, but it is not using the /psuedo
tag but a maybe new /pseudogene=""
tag I have not seen before!
https://www.ncbi.nlm.nih.gov/nuccore/MN657242
CDS 58826..59308
/gene="merR"
/note="pR148AM_129; merR"
/pseudogene="unknown"
/codon_start=1
/transl_table=11
/product="Hg(II)-responsive transcriptional regulator
MerR"
Looks like it is new: http://www.insdc.org/documents/feature_table.html
The confusion is that it is missing the /pseudo
tag to go along with it?
Qualifier /pseudo
Definition indicates that this feature is a non-functional version of the
element named by the feature key
Value format none
Example /pseudo
Comment The qualifier /pseudo should be used to describe non-functional
genes that are not formally described as pseudogenes, e.g. CDS
has no translation due to other reasons than pseudogenisation events.
Other reasons may include sequencing or assembly errors.
In order to annotate pseudogenes the qualifier /pseudogene= must be
used indicating the TYPE which can be taken from the INSDC controlled vocabulary
for pseudogenes.
Qualifier /pseudogene=
Definition indicates that this feature is a pseudogene of the element named
by the feature key
Value format "TYPE"
where TYPE is one of the following:
processed, unprocessed, unitary, allelic, unknown
Example /pseudogene="processed"
/pseudogene="unprocessed"
/pseudogene="unitary"
/pseudogene="allelic"
/pseudogene="unknown"
Comment TYPE is a term taken from the INSDC controlled vocabulary for pseudogenes
(http://www.insdc.org/documents/pseudogene-qualifier-vocabulary):
processed: the pseudogene has arisen by reverse transcription of a
mRNA into cDNA, followed by reintegration into the genome. Therefore,
it has lost any intron/exon structure, and it might have a pseudo-polyA-tail.
unprocessed: the pseudogene has arisen from a copy of the parent gene by duplication
followed by accumulation of random mutations. The changes, compared to their
functional homolog, include insertions, deletions, premature stop codons, frameshifts
and a higher proportion of non-synonymous versus synonymous substitutions.
unitary: the pseudogene has no parent. It is the original gene, which is
functional is some species but disrupted in some way (indels, mutation,
recombination) in another species or strain.
allelic: a (unitary) pseudogene that is stable in the population but
importantly it has a functional alternative allele also in the population. i.e.,
one strain may have the gene, another strain may have the pseudogene.
MHC haplotypes have allelic pseudogenes.
unknown: the submitter does not know the method of pseudogenisation.
Thanks so much for quick reply @tseemann.
I understand the problem now. For the time being, I got rid off MN657242 plasmid from the gbk database and ran prokka without much of a problem. Please advice us if there is a fix to this if wanted to include the MN657242 plasmid sequence. Thanks much in advance!
For now you can edit the GBK file and change /pseudogene="unknown"
to /pseudo
@tseemann noted. Thank you!
Hi Dr Seeman,
I am experiencing the same issue, but cannot find the problematic tag. I ran:
prokka sample.fasta --proteins db/plasmids.gbk --outdir ./prokka --prefix sample
Obtained the same error as above: Could not run command: prokka-genbank_to_fasta_db --format genbank /prokka\/proteins.faa 2> /dev/null
Then ran /path/to/plasmids/gbks/db.gbk > prokka\/proteins.faa 2> /dev/null
and there was no output. My proteins.faa file is now blank. Is there any way I can check what is causing the error?
Thank you, Daisy
Hello,
I had the same issue. Changing /pseudogene="unknown"
to /pseudo
helped.
Valery
Hi, I have the same problem "_Could not run command: prokka-genbank_to_fastadb". I checked the reference gbk file I am using to see if I have "/psesudogene="unknown", but it is not there. I have loaded the gbk file I am using in case you can see what I cannot. Could you kindly help me with this, please? Dania cluster_a.zip
Dear all:
I also have the same question
Could not run command: prokka-genbank_to_fasta_db --format genbank /prokka/proteins.faa 2> /dev/null
and I have no /psesudogene="unknown"
in my genebank file. I guess that maybe there are some weird characters which Prokka can not support. Therefore I find the source code and change prokka-genbank_to_fasta_db --format genbank All_NDM_Assigned_Plasmids_byMash_Plsdb\.gb > PROKKA_05222020_with_AllNDM_plasmids\/proteins\.faa 2> /dev/null
to prokka-genbank_to_fasta_db --format genbank All_NDM_Assigned_Plasmids_byMash_Plsdb\.gb > PROKKA_05222020_with_AllNDM_plasmids\/proteins\.faa 2
. Then when I run prokka again, it will output which line has an error. We can remove the record. It will work.
Thanks!
Dear all:
I also have the same question
Could not run command: prokka-genbank_to_fasta_db --format genbank /prokka/proteins.faa 2> /dev/null
and I have no
/psesudogene="unknown"
in my genebank file. I guess that maybe there are some weird characters which Prokka can not support. Therefore I find the source code and changeprokka-genbank_to_fasta_db --format genbank All_NDM_Assigned_Plasmids_byMash_Plsdb\.gb > PROKKA_05222020_with_AllNDM_plasmids\/proteins\.faa 2> /dev/null
toprokka-genbank_to_fasta_db --format genbank All_NDM_Assigned_Plasmids_byMash_Plsdb\.gb > PROKKA_05222020_with_AllNDM_plasmids\/proteins\.faa 2
. Then when I run prokka again, it will output which line has an error. We can remove the record. It will work.Thanks!
The error is like this:
Feature #1 does not have any of these tags: protein_id locus_tag db_xref at /xxxx/bin/prokka-genbank_to_fasta_db line 57, <> line 62075908.
and I find that it has no "locus_tag" in some sequences. Therefore I remove them.
Hi Seeman,
I downloaded few plasmids sequences in Genbank full format from NCBI for plasmid annotation and ran the following command:
prokka ENT2_Contig6_len_41186_circ_NDM-1_Plasmid.fasta --outdir PROKKA_05222020_with_AllNDM_plasmids --proteins All_NDM_Assigned_Plasmids_byMash_Plsdb.gb
However, I am getting the following error:
[20:53:55] Could not run command: prokka-genbank_to_fasta_db --format genbank All_NDM_Assigned_Plasmids_byMash_Plsdb\.gb > PROKKA_05222020_with_AllNDM_plasmids\/proteins\.faa 2> /dev/null
I ran the above command separately and that found the problem is here at Feature 22 of the Plasmid: MN657242
Could I request your help to overcome this? Thanks.
P.S: I am using prokka 1.14.6.