joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
586 stars 187 forks source link

Jackknified ordination plot #634

Closed davidealbanese closed 7 years ago

davidealbanese commented 8 years ago

I have recently developed a jackknifed version of plot_ordination() which estimates the uncertainty in PCoA plots. The general procedure is described here and here. Moreover, it is implemented in QIIME (http://qiime.org/tutorials/tutorial.html#steps-7-and-8-compare-pcoa-plots).

The function displays the confidence ellipses around the samples ordination plot:

library(phyloseq)
library(ggplot2)

data(GlobalPatterns)
theme_set(theme_bw())

GP = GlobalPatterns
human = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
sample_data(GP)$human <- factor(human)

# PCoA plot using the rarefied table (unweighted UniFrac)
GP.rare = rarefy_even_depth(GP, rngseed=1, sample.size=min(sample_sums(GP)))
GP.ord <- ordinate(GP.rare, "PCoA", "unifrac", weighted=FALSE)
p1 = plot_ordination(GP.rare, GP.ord, type="samples", color="SampleType", shape="human")
p1 + geom_polygon(aes(fill=SampleType)) + geom_point(size=5) + ggtitle("samples") +
  theme(aspect.ratio=1)
p1

rplot01

# jackknified PCoA plot (unweighted UniFrac)
#10 replicates, 95% confidence ellipses assuming a multivariate normal distribution
p2 = jackknifed_plot_ordination(GP, n=10, sample.size=min(sample_sums(GP)), rngseed=1, 
                                color="SampleType", replace=T, axes=c(1,2), level=0.95,
                                shape="human", method="PCoA", distance="unifrac", weighted=FALSE) +
  theme(aspect.ratio=1)

rplot02

Are you interested in adding this feature in phyloseq? I can add this function+documentation in plot-methods.R and create a pull request or I can send you the code.

Davide

spholmes commented 8 years ago

Hi Davide Actually, this is not a jacknife per the statistical definition of the jacknife: which is a leave one out estimation of weighted statistics (where all the n-1 estimates get an (n-1) weight and the null sample an (n) weight. The estimates of variability can be useful, but we prefer a contour approach which is nonparamteric as ellipses make the user think that the variability is Gaussian (which it is not).

If you are interested in projections of uncertainty you might like this reference which shows nonparametric estimates of uncertainty done with contour plots: http://arxiv.org/abs/1601.05156

For the time being, we are not including this feature in phyloseq but pointing people to examples of uncertainty maps and tutorials.

Thanks for posting, Susan

On Tue, Jul 5, 2016 at 7:59 AM, Davide Albanese notifications@github.com wrote:

I have recently developed a jackknifed version of plot_ordination() which estimates the uncertainty in PCoA plots. The general procedure is described here http://aem.asm.org/content/73/5/1576.full and here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3105689/. Moreover, it is implemented in QIIME ( http://qiime.org/tutorials/tutorial.html#steps-7-and-8-compare-pcoa-plots).

The function displays the confidence ellipses around the samples ordination plot:

library(phyloseq) library(ggplot2)

data(GlobalPatterns) theme_set(theme_bw()) GP = GlobalPatternshuman = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue") sample_data(GP)$human <- factor(human)

PCoA plot using the rarefied table (unweighted UniFrac)GP.rare = rarefy_even_depth(GP, rngseed=1, sample.size=min(sample_sums(GP)))GP.ord <- ordinate(GP.rare, "PCoA", "unifrac", weighted=FALSE)p1 = plot_ordination(GP.rare, GP.ord, type="samples", color="SampleType", shape="human")p1 + geom_polygon(aes(fill=SampleType)) + geom_point(size=5) + ggtitle("samples") +

theme(aspect.ratio=1)p1

[image: rplot01] https://cloud.githubusercontent.com/assets/3673156/16588672/378e7704-42d0-11e6-9b5a-a8c541e0a72c.png

jackknified PCoA plot (unweighted UniFrac)# 10 replicates, 95% confidence ellipses assuming a multivariate normal distributionp2 = jackknifed_plot_ordination(GP, n=10, sample.size=min(sample_sums(GP)), rngseed=1,

                            color="SampleType", replace=T, axes=c(1,2), level=0.95,
                            shape="human", method="PCoA", distance="unifrac", weighted=FALSE) +

theme(aspect.ratio=1)

[image: rplot02] https://cloud.githubusercontent.com/assets/3673156/16588682/43351540-42d0-11e6-838b-2de09788f5e2.png

Are you interested in adding this feature in phyloseq? I can add this function+documentation in plot-methods.R and create a pull request or I can send you the code.

Davide

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/634, or mute the thread https://github.com/notifications/unsubscribe/ABJcvQEUiL_iu--OVoRy7nWMA9b_ctLUks5qSnFRgaJpZM4JFNXY .

Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

davidealbanese commented 8 years ago

Dear Susan, I am aware of the statistical definition of jeckknifing. With this terminology I was talking about the (probably misleading) definition here and here.

Anyway, I see your approach (defined in http://arxiv.org/abs/1601.05156) implemented in https://github.com/boyuren158/DirichletFactor. I will try it.

Thank you, Davide

spholmes commented 8 years ago

Davide I am afraid this is not jackknifing as it was defined by the inventors of the jackknife:

Quenouille and Tukey, see the very nice book: An Introduction to the Bootstrap by Efron and Tibshirani or the very nice article by Efron: https://projecteuclid.org/euclid.aos/1176344552

as someone who wrote their thesis on Bootstrap and Jacknife in the multivariate context, I am afraid I cannot easily `redefine' it in projects I am associated to (especially as I am in the same dept as Efron).

But the main message is: there is no evidence that the data are ellipsoidal so I think ellipses built this way would just confuse people, if there are many samples and one is looking at sample averages where the CLT might kick in, this may be different, but there is still very little work in the context of the microbiome to prove this.

Best Susan

On Tue, Jul 5, 2016 at 10:27 AM, Davide Albanese notifications@github.com wrote:

Dear Susan, I am aware of the statistical definition of jeckknifing. With this terminology I was talking about the (probably misleading) definition here http://aem.asm.org/content/73/5/1576.full and here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3105689/.

Anyway, I see your approach (defined in http://arxiv.org/abs/1601.05156) implemented in https://github.com/boyuren158/DirichletFactor. I will try it.

Thank you, Davide

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/634#issuecomment-230545253, or mute the thread https://github.com/notifications/unsubscribe/ABJcvYsLVuJLCNeuWkAoXrSOfWpPTUI4ks5qSpQHgaJpZM4JFNXY .

Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

davidealbanese commented 8 years ago

Susan, I agree with you. Could you point me to examples of uncertainty maps and tutorials using phyloseq and DirichletFactor?

Thank you, Davide

spholmes commented 8 years ago

Hi there Davide, We are writing up a separate R package for Dirfactor, but no tutorials yet, the paper is currently under revision so we are working actively on that right now.

Best Susan

On Tue, Jul 5, 2016 at 11:02 AM, Davide Albanese notifications@github.com wrote:

Susan, I agree with you. Could you point me to examples of uncertainty maps and tutorials using phyloseq and DirichletFactor?

Thank you, Davide

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/634#issuecomment-230554791, or mute the thread https://github.com/notifications/unsubscribe/ABJcvZ1ZNf3jYsG5wzYLTfMZGhuIpcWTks5qSpwpgaJpZM4JFNXY .

Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/