Closed aekazakov closed 1 month ago
Hi @aekazakov and thanks for the detailed report!
I'll have a look at it and try to include this into the upcoming v1.10.0
release.
OK, this should be fixed by #332.
@aekazakov If you like, and feel comfortable with being a very-early-tester, you could checkout the fix-edge-user-proteins
branch at https://github.com/oschwengers/bakta/tree/fix-edge-user-proteins and give it a try.
I tested the fix-edge-user-proteins branch with two RefSeq genomes (NZ_CP012831.1 and NZ_CP015511.1) that have gene and CDS features overlapping sequence origin on - and + strand, respectively. For both genomes, the GenBank files were imported without errors, and the genes overlapping origin were reported correctly in Bakta output files. Thank you for fixing this bug!
Thanks for testing and reporting back. With that, I'll close this. If there are any further things to discuss or issues arising from this, please do not hesitate to re-open this or a new one. Thanks.
Hi! Thank you for the great tool!
On a genome that has a coding gene overlapping circular sequence origin, Bakta terminates with error.
I run Bakta version 1.9.4 installed with conda, database version 5.1.
The command executed is: bakta --debug --db /mnt/data/ref/Bakta/v5.1/db --output test_bakta --prefix NZ_CP0128315.bakta --threads 8 --regions sequence.gb sequence.fasta
Input files sequence.gb and sequence.fasta for the NCBI sequence NZ_CP012831.1 were downloaded from https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012831.1 and contain full Genbank record and FASTA-formatted nucleotide sequence.
This sequence has a gene and CDS features overlapping sequence origin:
Error message from NZ_CP0128315.bakta.log is:
This error occurs because the CDS sequence extracted by extract_feature_sequence (bakta/utils.py) is wrong. It is 7.1 Mbp long instead of 774 bp.