waldronlab / curatedMetagenomicData

Curated Metagenomic Data of the Human Microbiome
https://waldronlab.io/curatedMetagenomicData
Artistic License 2.0
123 stars 27 forks source link

Incorporate GTDB Taxonomy v214 into curatedMetagenomicData #301

Open camilagazolla opened 7 months ago

camilagazolla commented 7 months ago

Dear curatedMetagenomicData Maintainers,

Considering that GTDB taxonomy is a cornerstone for microbial genomics research, I was wondering if there is a possibility to also provide GTDB taxonomy labels, particularly the latest version (v214), within the data from the curatedMetagenomicData package.

If the incorporation is currently out of scope, could you advise on the best approach for users to translate the current MetaPhlAn3 labels to the GTDB labels?

Thank you for considering this feature request.

lwaldron commented 7 months ago

I'm looking for guidance on the Biobakery forum: https://forum.biobakery.org/t/converting-metaphlan3-profiles-to-gtdb/6292

wshuai294 commented 4 months ago

I am also wondering if there is a way to convert the MetaPhlAn3 taxonomy to the GTDB taxonomy.

lwaldron commented 4 months ago

I re-posted my question on bioBakery. Pinging @seandavi to request adding gtdb profiles to the next version (apparently available as of MetaPhlan 4.0.6 - https://github.com/biobakery/MetaPhlAn/releases), pending investigation of how much computation it will add.

lwaldron commented 4 months ago

I just spoke with a member of the MetaPhlAn development team. The translation tool (https://github.com/biobakery/MetaPhlAn/blob/master/metaphlan/utils/sgb_to_gtdb_profile.py) isn't implemented for MetaPhlAn3, because of a complication that the mapping to GTDB is not directly n:1. For cMD4 utilizing MetaPhlAn4, the mapping will be straightforward:

  1. direct substitution when the mapping is 1:1
  2. binning if the mapping is n:1 (n>1)
  3. re-normalizing to make sample sums add to 1

Let's keep this to a wishlist item for cMD4, where it will be relatively straightforward to add.