oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
444 stars 55 forks source link

option to include scores in bakta output #314

Closed jvera888 closed 1 month ago

jvera888 commented 2 months ago

Hi,

Bakta is a superb annotation tool, so thanks for all the hard work! However, there does not appear to be a way to include any annotation scores (e.g. evalue, percent identity, etc) in the output (e.g. Dfast does this), unless I'm missing something very obvious (not unheard of).

I realize not everyone wants their genbanks and GFFs cluttered up with this extra info, but an option to have these included in some way would be great for those of us who need that extra level of certainty.

Thanks, Cris

oschwengers commented 2 months ago

Hi @jvera888 , thanks for the idea. I consider adding a more verbose output in Genbank/EMBL/GFF3 files. Just have to think about how to organize the CLI. Maybe best to hook that behavior to the existing --verbose flag...

gbouras13 commented 1 month ago

This would be super duper useful I think, especially as a separate .tsv file :)

George

oschwengers commented 1 month ago

OK, I'm currently working on a solution. As @jvera888 already mentioned, I'm a bit reluctant to write all available inference information to the *.gbff/*.embl/*.gff3 files. We have varying subsets of bitsore, evalue, query coverage, subject coverage and sequence identity, and I feel that it's just to much to store everything into the attribute fields, and selecting certain fields is a bad compromise.

Hence, I tend towards @gbouras13 's idea of an extra <prefix>.inference.tsv TSV output file of the following format:

contig-id  feat-type  start  stop  strand  locus-tag  score  evalue  query-cov  subject-cov  identity

This wouldn't clutter up *.gbff/*.embl/*.gff3 files and provide all information Bakta has.

@jvera888 , @gbouras13, @ndombrowski (and anyone else) Any thoughts, idea or comments? Any feedback in advance is highly welcome!

BTW, is there a better term for inference , and suffix for the new file *.inference.tsv?

oschwengers commented 1 month ago

OK, this is implemented in #331 . Thanks again for the idea, comments and feedback!