oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
426 stars 51 forks source link

origin of replication detection not match Ori-Finder web server #266

Open Jigyasa3 opened 8 months ago

Jigyasa3 commented 8 months ago

Hi Bakta developers!

Thank you for a great tool for plasmid and bacterial genome annotation! I am using Bakta specifically to annotate the origin of replication. In some of my plasmid genomes, Bakta cannot find the origin of replication while it is detected by Ori-Finder. While I understand that under the hood, Bakta is using Blast search to find the origin of replication, is it possible to extend the analysis to also incorporate GC skew and location of repeat regions for annotation? At the same time, I understand I can use Ori-Finder for the same, but it's a web-tool and I am interested in the origin of replication identification in ~1000s of plasmid genomes.

Any suggestions would be super helpful!

Example-

Genome ID- NC_011798.1
Bakta - NULL
Ori-Finder- 0 ... 171 nt

Genome ID- CP080583.1
Bakta - NULL
Ori-Finder- 18,083 ... 19,317 nt
oschwengers commented 7 months ago

Hi @Jigyasa3, thanks for reaching out with this. Yes, you're perfectly right, Bakta conducts a simple Blast search of the oriC/oriV sequences from the DoriC database followed by a majority vote algorithm to find the best hit. It goes w/o saying that this simple approach is by no means as powerful as the one that Ori-Finder conducts. To our knowledge, unfortunately, there is no public version of Ori-Finder. If there is one or you find public scripts that we could use, I'd be very open to follow that route. Otherwise, there is currently not much that we can do about this. Maybe you could also ask the main developers of Ori-Finder? Please, let me know if you can find out anything in that regard.