rnajena / viralclust

Small pipeline to cluster viral genomes based on their k-mer content. WiP
GNU General Public License v3.0
15 stars 4 forks source link

NCBI annotation via Regex #7

Closed klamkiew closed 3 years ago

klamkiew commented 3 years ago

sigh apparently, the taxonomic lineage of viruses isn't consistent among viruses. sometimes, the species is the last entry, sometimes it is second to last and everything else gets shifted as well.

a regex that looks for the family/genus suffix (-viridae and -virus) should help to avoid mis-annotations in my results

klamkiew commented 3 years ago

regex is most likely not needed, but the issue itself remains.