tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
822 stars 224 forks source link

COORDINATES: qualifier for Infernal/Aragorn output #47

Closed wksmits closed 9 years ago

wksmits commented 9 years ago

I have been using Prokka to annotate de novo generated whole genome sequences of bacteria, based on species or a trusted database of proteins. I use the GBK output of Prokka to import the genome sequence into Artemis, where I do tweaks to the annotation, such as missed pseudogenes, for instance. I save the files as EMBL flat files for submission to ENA/SRA. Before submission I run the EnaValidator.jar to check for issues with the EMBL file. During these checks, it gives an error that turns out to be because of a space after the " COORDINATES: " qualifier. When I remove this in Artemis manually, the error is gone. I don't know where in the Prokka pipeline this space gets inserted, but it would be helpful to fix this (if possible).

tseemann commented 9 years ago

@wksmits This bug is part of the "tbl2asn" tool which NCBI provides, and Prokka uses to the write the .GBK file. I have emailed NCBI without luck.

I should probably write a post-processing script in Prokka to "fix" the Genbank file, which could also add proper VERSION and ACCESSION values I also have difficulty coaxing "tbl2asn" to do what I want.

tseemann commented 9 years ago

This still isn't fixed in the latest tbl2asn, so I've added a sed filter to fix it myself.

peterjc commented 8 years ago

Good to know this is fixed in Prokka 1.11 :)

I just hit this with GenBank output from Prokka 1.10 using the EMBL flat file validator embl-client.jar dated 2015/09/10 (which has no version information) from ftp://ftp.ebi.ac.uk/pub/databases/ena/lib/ as documented on http://www.ebi.ac.uk/ena/software/flat-file-validator (presumably a renamed version of EnaValidator.jar).

The sed fix copied from https://github.com/tseemann/prokka/commit/94d1b057aafecde55457d947c9a52e8ab7dec494 resolved this error:

ERROR: Feature qualifier "inference" does not contain one of the permitted values - " profile" is not permitted (QualifierCheck-4)  line: 21 of sample.gbk

from line 21 in my file:

                     /inference="COORDINATES: profile:Aragorn:1.2"

which should be:

                     /inference="COORDINATES:profile:Aragorn:1.2"