grunwaldlab / metacoder_documentation

Documentation for the metacoder R package
https://grunwaldlab.github.io/metacoder_documentation/
2 stars 2 forks source link

Use relative abundance table for diversity statistics #12

Open Dahn-YoungDong opened 2 years ago

Dahn-YoungDong commented 2 years ago

Hello in your tutorial, you said "Alpha diversity measures the diversity within a single sample and is generally based on the number and relative abundance of taxa at some rank (e.g. species or OTUs). Beta diversity also uses the number of relative abundance of taxa at some rank, but measures variation between samples."

I have relative abundance output from Metaphlan 3. But in some other forums, it advises against using relative abundance. Especially, in your tutorial data, it is rarefied OTU count, not relative abundance. Do you think that makes a difference?

Dahn-YoungDong commented 2 years ago

@zachary-foster

zachary-foster commented 2 years ago

The term "relative abundance" is being used differently in the two contexts. In:

Alpha diversity measures the diversity within a single sample and is generally based on the number and relative abundance of taxa at some rank (e.g. species or OTUs). Beta diversity also uses the number of relative abundance of taxa at some rank, but measures variation between samples.

I am talking about how metrics take into account both the number of species as well as their abundance. The term "relative" here is used to emphases that differences in abundance, even if the same species are present, influence the diversity metric. Whether the abundance measure is read counts or proportions is not important in this context. the word "relative" could be removed and it would have the same meaning.

In the other context, "relative abundance" means converting the read count to proportions. For some metrics that assume unfiltered read count data (e.g. Chao1), proportions cant be used. Most other metrics can use proportions, but simply converting to proportions does not account for the effect of differences in sampling depth. Therefore, the same community sequenced with more reads will have a higher diversity metric using the raw read counts or proportions. The purpose of rarefaction is to avoid this bias. You could also set any proportions to 0 that are less than 1 / (the minimum read depth for a sample)

Sorry if that is confusing. Let me know if it does not make sense.