oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
448 stars 55 forks source link

Missing genes in both faa files #317

Closed EdderDaniel closed 1 month ago

EdderDaniel commented 2 months ago

Hi!

I ran the standard annotation instruction with bakta 1.9.4 (installed with conda) and after looking at the outputs i noticed that some genes are missing from the faa files (hypotheticals and regular faa file), but present in all of the other annotation files, which is really odd, because if it reported in the gbff there should't be a reason why they are not included in the faa, rigth? Or is there a reason for this?

oschwengers commented 2 months ago

Hi @EdderDaniel , thanks for reporting, and yes, if this is reproducible, then it's indeed odd and not intended. Hence, could you provide a genome sequence for a reproducible example? I'd love to take a closer look into this.

EdderDaniel commented 2 months ago

Sure! I'm sending you a tar.gz file with the fna, tsv, faa, log and gbff. the genome that i used is Aetokthonos_hydrillicola_CCALA_1050, which i got from NCBI and the bakta options that are use were meta compliant and keep-contig-headers

Thanks for the help!

Aetokthonos_hydrillicola_CCALA_1050.tar.gz

EdderDaniel commented 1 month ago

Hi Oliver! Just dropping by to say that i´ve been looking into this and it appears that the missing genes are always RNA related (ribosomal, tRNA, etc...). Maybe that´ll help to pinpoint the bug?

oschwengers commented 1 month ago

Hi. Haven't looked into it yet, but if you can already assure that all missing genes are actually RNA genes, then they should be excluded from the *.faa files for a good reason since they're not translated.

If this explanation is too obvious - then in that case, I'm sorry and I might be missing something.

EdderDaniel commented 1 month ago

Ha! No, you are completely right. For some reason i confounded faa and ffn in my mind. I´ll close the issue now. Sorry for the confusion!