jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

Add the taxonomic category "Kingdom" for eukaryotes #747

Open glenjasper opened 8 months ago

glenjasper commented 8 months ago

It would be fantastic if in a future update, they could add the taxonomic category "Kingdom", this category applies not only to eukaryotes (kingdom: Fungi, Animalia, Vividiplantae, Matazoa, etc.) but also to Viruses. Of course, this category doesn't apply to bacteria and archaea, which would have the value of NA. The Kingdom category is well defined in the NCBI Taxonomy database.

Best, Glen,

fpusan commented 8 months ago

This is in principle possible.

By looking at the LCA_tax/parents.txt file in the SqueezeMeta database folder, kingdom is indeed a well defined field.

See eg Dangeardiella macrospora superkingdom:Eukaryota;clade:Opisthokonta;kingdom:Fungi;subkingdom:Dikarya;phylum:Ascomycota;clade:saccharomyceta;subphylum:Pezizomycotina;clade:leotiomyceta;clade:dothideomyceta; class:Dothideomycetes;no rank:Dothideomycetes incertae sedis;genus:Dangeardiella;species:Dangeardiella macrospora 100009

We could maybe make Bacteria and Archea have the same value as in the Superkingdom, instead of assigning NA. On one hand this is not technically correct, on the other hand it would make our life easier (eg doing plotTaxonomy at the kingdom level would break if we have NAs around).

This change would involve changes in the database creation step, several SqueezeMeta scripts, and SQMtools. I don't foresee the individual changes to be very big but added up this is a somewhat large undertaking. @jtamames what do you think?

jtamames commented 8 months ago

Hello It would be technically possible, but it will break several things. We would have to redefine the accepted levels for taxonomy and tweak several scripts to accept the change.

I would keep Bacteria and Archaea as NA for that rank rather than creating a non-existent kingdom rank for them. Probably you are already dealing with NAs somehow, @fpusan, since some taxa don´t have some intermediate ranks (I am thinking in cyanos here).

I will think on this for further versions.

Best, J

fpusan commented 8 months ago

Ah yes, I actually do. I add a no rank in NCBI tag to those cases

glenjasper commented 8 months ago

Excellent, after further analysis, they might decide to implement it.

Best, Glen,