qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

support other normalization strategies and option to disable rarefying in `core-*` methods #161

Open nbokulich opened 6 years ago

nbokulich commented 6 years ago

Improvement Description E.g., proportion and DESeq Variance Stabilization would be good alternate strategies for weighted distance methods, as shown here.

Questions I realize this would need to be applied to beta diversity methods but not alpha (unless if said normalization strategies are okay on alpha data) — perhaps have a separate flag to disable rarefying and/or apply other normalization just on beta analyses?

References

  1. here.
  2. Raised in this forum thread
jairideout commented 6 years ago

forum xref

CarlyMuletzWolz commented 6 years ago

Normalization should be performed on the data prior to any diversity analyses, including alpha (e.g., Shannon), beta and abundance analyses. The normalization will not affect analyses of presence-absence data compared to raw data (e.g., Jaccard beta), but will of course differ from rarified data. Normalization will affect analyses when sequence counts are used as information to estimate diversity compared to raw data (e.g., Chao1 alpha). Normalization should be the last step in creating the otu table with (no rarefaction implemented), that is then used for analyses. Normalization reduces the variation in sequence coverage among samples, will retaining the original data. Low coverage samples (approx. <1,000 sequences) should still be removed from analyses though.

Alpha-diversity and statistical issues from rarefaction: https://www.biorxiv.org/content/early/2017/12/11/231878

An older post with discussion with McMurdie on rarefaction at the end of the post -- saying "I consider it a personal failure of mine that you could read "Waste Not, Want Not..." several times and come to the conclusion that rarefying is okay for alpha or beta diversity analyses....etc". I have since become an advocate for no rarefaction and normalization prior to any diversity estimates: https://github.com/joey711/phyloseq/issues/603#issuecomment-219180008

nbokulich commented 6 years ago

forum xref