relate environmental variables to OTU clusters in network

joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:

http://joey711.github.io/phyloseq/

584 stars 187 forks source link

relate environmental variables to OTU clusters in network #473

Closed PAE-lab closed 9 years ago

PAE-lab commented 9 years ago

Hi,

how could I correlate OTU clusters in a network to environmental parameters? Is this possible using the sample_data file in phyloseq? Are there other possibilities? How do you people do this?

Thanks!

PAE-lab commented 9 years ago

Anybody?

spholmes commented 9 years ago

This is a statistical methodology question not a phyloseq question, the standard methods people are currently using involve the vegan function adonis for instance, you might look at some of the online workflow examples provided by Joey: http://joey711.github.io/phyloseq-demo/Restroom-Biogeography also see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3357092/

Best Susan

On Fri, May 8, 2015 at 2:59 AM, Biyorun notifications@github.com wrote:

Hi,

how could I correlate OTU clusters in a network to environmental parameters? Is this possible using the sample_data file in phyloseq? Are there other possibilities? How do you people do this?

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/473.

Susan Holmes Professor, Statistics and BioX Director, MCS Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

joey711 commented 9 years ago

I agree with Susan. However, this is a much-requested feature. Any ideas for smoothing or better documenting this? I wouldn't want to imply that my Restroom example covers all needs.

audy commented 9 years ago

For comparing the "agreement" of one set of clusterings versus another label, you can use the homogeneity and completeness scores, or the harmonic mean of the two (v_measure).

There is some documentation on these in the SciKit-Learn package in Python: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.homogeneity_completeness_v_measure.html#sklearn.metrics.homogeneity_completeness_v_measure. There is probably an implementation in R.

joey711 commented 9 years ago

Thanks @audy

Between http://cran.r-project.org/web/views/Cluster.html , http://cran.r-project.org/web/views/gR.html , and http://cran.r-project.org/web/views/MachineLearning.html (among many others), I think R has this covered.

The OP, @Biyorun , has not made clear precisely what kind of test is desired. What network? Are the nodes "clusters of OTUs", or "OTU clusters" (AKA, OTUs)? What is the hypothesis test or general nature of the inference that needs to be tested.

I think Susan's answer satisfies this issue, so I will close for now. However, please feel free to post back with reproducible example code and/or links to tutorials or blog posts where a relevant example is demonstrated.

Cheers

joey

PAE-lab commented 9 years ago

Dear Dr. McMurdie,

it are indeed clustered OTUs in a network. (see figure), not sequences clustered into OTUs.

I would like to superimpose/add the environmental variables that correlate/are significantly associated with the observed clusters. Basically my question was if this can be done directly from the physeq object (with OTU and ENV files), or if this has (can?) to be done in another way?

Furthermore, is it possible to use unweighted lines in the plot_net function?

Thanks in advance.

Kind regards