joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
568 stars 187 forks source link

Tax_glom and rarefy_even_depth #1077

Open laylaeb opened 5 years ago

laylaeb commented 5 years ago

Hi,

I would have a question regarding the use of tax_glom and rarefy_even_depth in phyloseq. For my study, after having merged the tree, the metadata, the OTU table and taxonomy, I first used tax_glom then rarefy_even_depth. Following this I computed alpha and beta diversity metrics. Somebody told me that it is actually incorrect to use tax_glom before rarefying my data. Also that the use of tax_glom is tricky and should not be used to compute diversity measures since we underestimate our diversity. Its a conservative approach and I agree that richness decreases a bunch using taxglom but our point in using taxglom to measure diversity, was to measure it a the level of the species, or genus or whatever taxonomic level we used; not at the level of the OTUs!

What would be the best to do?

a) tax_glom and then rarefy? b) rarefy and then tax_glom? I think we have similar results if we compute tax_glom either before or after rarefying.

c) using tax_glom only in certain cases such as when computing relative abundances or differential abundances (no rarefaction in those cases)?

Thank you very much for your help!

mikemc commented 5 years ago

(I think) Whether it matters if you use tax_glom before or after rarefy_even_depth depends on how you analyzed your sequence data and assigned taxonomy, and what tax_glom options you use (specifically, the NArm option). If you did closed-reference OTU mapping, for example, then all of your OTUs should have taxonomy to genus level, and tax_glom to genus will not be getting rid of reads. But if you used DADA2, and assigned taxonomy to ASVs, then many ASVs might not have a genus, and tax_glom to genus will get rid of the reads of many ASVs unless you set NArm=FALSE.

As to whether you should do alpha and beta diversity at a specific taxonomic rank or at the OTU or ASV level, or with a phylogenetic/tree-based approach: there is no right answer since it depends on your specific research question and the biology / properties of the ecosystem you are studying, so a safe way to go is to try it multiple ways (e.g., alpha diversity at family and ASV level) and see if they suggest similar conclusions, documenting what you did and found in the Methods and Rmd/R scripts, to avoid unintentional p-hacking.