On a genome that has a coding gene overlapping circular sequence origin, Bakta terminates with error.
I run Bakta version 1.9.4 installed with conda, database version 5.1.
The command executed is:
bakta --debug --db /mnt/data/ref/Bakta/v5.1/db --output test_bakta --prefix NZ_CP0128315.bakta --threads 8 --regions sequence.gb sequence.fasta
Input files sequence.gb and sequence.fasta for the NCBI sequence NZ_CP012831.1 were downloaded from https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012831.1 and contain full Genbank record and FASTA-formatted nucleotide sequence.
This sequence has a gene and CDS features overlapping sequence origin:
gene complement(join(7118734..7119102,1..405))
/locus_tag="AO356_RS00005"
/old_locus_tag="AO356_00005"
CDS complement(join(7118734..7119102,1..405))
/locus_tag="AO356_RS00005"
/old_locus_tag="AO356_00005"
/EC_number="[1.1.1.14](https://enzyme.expasy.org/EC/1.1.1.14)"
/inference="COORDINATES: similar to AA
sequence:RefSeq:WP_004415564.1"
/GO_function="[GO:0016491 - oxidoreductase activity](http://amigo.geneontology.org/amigo/term/GO:0016491)
[Evidence IEA]"
/note="Derived by automated computational analysis using
gene prediction method: Protein Homology."
/codon_start=1
/transl_table=[11](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG11)
/product="L-iditol 2-dehydrogenase"
/protein_id="[WP_013692788.1](https://www.ncbi.nlm.nih.gov/protein/503458127)"
/translation="MKRLEGKSALITGSARGIGRSFAQAYIGEGATVAIADINLERAQ
ATASELGPQAYAVEMDVTRQASIDAAIAEVVAHAGKLDILVNNAALFDLAPITEITRE
SYDKLFAINVSGTLFTLQAAAKQMISQGHGGKIINMASQAGRRGEALVGVYCATKAAV
ISLTQSAGLNLIKHKINVNAIAPGVVDGEHWDGVDALFAKHENRPLGEKKRLVGLEVP
YGRMGTAEDLVGMAIFLAGSESDYIVAQTYNVDGGNWMS"
Error message from NZ_CP0128315.bakta.log is:
11:32:33.190 - DEBUG - MAIN - revise translational exceptions
11:32:33.202 - DEBUG - MAIN - import user-provided CDS regions
11:32:34.399 - ERROR - CDS - user-provided CDS: CDS could not be translated into a valid amino acid sequence! contig=contig_1, start=1, stop=7119102, cds=ATCAACATGG...[7119092 symbols deleted]
11:32:34.428 - ERROR - CDS - user-provided CDS: regions/features file GenBank format not valid!
Traceback (most recent call last):
File "/home/aekazakov/tools/anaconda3/envs/bakta/lib/python3.10/site-packages/bakta/features/cds.py", line 290, in import_user_cdss
aa = str(Seq(nt).translate(table=cfg.translation_table, cds=True))
File "/home/aekazakov/tools/anaconda3/envs/bakta/lib/python3.10/site-packages/Bio/Seq.py", line 1620, in translate
_translate_str(str(self), table, stop_symbol, to_stop, cds, gap=gap)
File "/home/aekazakov/tools/anaconda3/envs/bakta/lib/python3.10/site-packages/Bio/Seq.py", line 2871, in _translate_str
raise CodonTable.TranslationError(
Bio.Data.CodonTable.TranslationError: Final codon 'ATC' is not a stop codon
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/aekazakov/tools/anaconda3/envs/bakta/lib/python3.10/site-packages/bakta/features/cds.py", line 295, in import_user_cdss
raise ValueError(f"User-provided CDS could not be translated into a valid amino acid sequence! contig={user_cds['contig']}, start={user_cds['start']}, stop={user_cds['stop']}, cds={nt}")
ValueError: User-provided CDS could not be translated into a valid amino acid sequence! contig=contig_1, start=1, stop=7119102, cds=ATCAACATGG...[7119092 symbols deleted]
This error occurs because the CDS sequence extracted by extract_feature_sequence (bakta/utils.py) is wrong. It is 7.1 Mbp long instead of 774 bp.
Hi! Thank you for the great tool!
On a genome that has a coding gene overlapping circular sequence origin, Bakta terminates with error.
I run Bakta version 1.9.4 installed with conda, database version 5.1.
The command executed is: bakta --debug --db /mnt/data/ref/Bakta/v5.1/db --output test_bakta --prefix NZ_CP0128315.bakta --threads 8 --regions sequence.gb sequence.fasta
Input files sequence.gb and sequence.fasta for the NCBI sequence NZ_CP012831.1 were downloaded from https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012831.1 and contain full Genbank record and FASTA-formatted nucleotide sequence.
This sequence has a gene and CDS features overlapping sequence origin:
Error message from NZ_CP0128315.bakta.log is:
This error occurs because the CDS sequence extracted by extract_feature_sequence (bakta/utils.py) is wrong. It is 7.1 Mbp long instead of 774 bp.