Closed jvera888 closed 1 month ago
Hi @jvera888 , thanks for the idea. I consider adding a more verbose output in Genbank/EMBL/GFF3 files. Just have to think about how to organize the CLI. Maybe best to hook that behavior to the existing --verbose
flag...
This would be super duper useful I think, especially as a separate .tsv
file :)
George
OK, I'm currently working on a solution. As @jvera888 already mentioned, I'm a bit reluctant to write all available inference information to the *.gbff
/*.embl
/*.gff3
files. We have varying subsets of bitsore, evalue, query coverage, subject coverage and sequence identity, and I feel that it's just to much to store everything into the attribute fields, and selecting certain fields is a bad compromise.
Hence, I tend towards @gbouras13 's idea of an extra <prefix>.inference.tsv
TSV output file of the following format:
contig-id feat-type start stop strand locus-tag score evalue query-cov subject-cov identity
This wouldn't clutter up *.gbff
/*.embl
/*.gff3
files and provide all information Bakta has.
@jvera888 , @gbouras13, @ndombrowski (and anyone else) Any thoughts, idea or comments? Any feedback in advance is highly welcome!
BTW, is there a better term for inference , and suffix for the new file *.inference.tsv
?
OK, this is implemented in #331 . Thanks again for the idea, comments and feedback!
Hi,
Bakta is a superb annotation tool, so thanks for all the hard work! However, there does not appear to be a way to include any annotation scores (e.g. evalue, percent identity, etc) in the output (e.g. Dfast does this), unless I'm missing something very obvious (not unheard of).
I realize not everyone wants their genbanks and GFFs cluttered up with this extra info, but an option to have these included in some way would be great for those of us who need that extra level of certainty.
Thanks, Cris