nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
118 stars 34 forks source link

Profiler: mOTUs #14

Closed jfy133 closed 2 years ago

jfy133 commented 2 years ago

Description of feature

Phylogenetic markers are genes that can be used to reconstruct the evolutionary history of organisms and to profile the taxonomic composition of environmental samples. Efforts to find a good set of protein-coding phylogenetic marker genes led to the identification of 40 universal marker genes (MGs) [1,2]. These 40 MGs occur in single copy in the vast majority of known organisms and they have been used to delineate prokaryotic organisms at the species level [3].

We developed the mOTU profiler as a successor of the original version described in [4]. It uses 10 of the 40 MGs to taxonomically profile shotgun metagenomes, to quantify metabolically active members in metatranscriptomics and to quantify differences between strain populations using single nucleotide variation (SNV) profiles. We extracted the MGs from ~86,000 prokaryotic reference genomes and more than 3,100 publicly available metagenomes (from major human body sites, gut microbiome samples from disease association studies, and ocean water samples). Clustering of MGs led to the generation of a database of MG-based operational taxonomic units (mOTUs) containing 2,297 metagenomic mOTUs (meta-mOTUs) and 11,915 reference mOTUs (ref-mOTUs). For the most recent version (2.6) we extended the database by 19.358 (ext-mOTUs) using MGs from ~600,000 metagenome assembled genomes from 23 environments (mouse, cat, dog, pig, freshwater, wastewater, air, ...). Alignments against this database are then used to taxonomically classify reads, to identify metabolically active members and to profile sub-species level SNVs.

https://motu-tool.org/

jfy133 commented 2 years ago

Completed in https://github.com/nf-core/taxprofiler/pull/101