joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Ordination plots: using absolute sequences counts or relative abundances? #638

Closed radikalana closed 7 years ago

radikalana commented 8 years ago

Dear all,

I am Ana, and I am working with some rhizospheric bacterial communities. I am trying to obtain ordination plots (PCoA and NMDS) with phyloseq, but I have some questions.

I exported my biom file (where I can find the OTUs, the number of sequences per each OTU in each of my samples, and the corresponding taxonomy), and also a phylogenetic tree and a metadata table, and finally I created a phyloseq file. Therefore, I used "ordinate" function to obtain different ordination plots such as PCoA (using weighted UniFrac, and Bray Curtis metrics), and NMDS (also based on these distances). So, I created 4 different plots.

Now, I wonder if what I did is OK, because I used the biom format but I did not used the relative abundances of each OTU to obtain the plots. I have transformed the data using "transform_sample_counts(my_file, function(x){x/sum(x)})", and now the ordination plots look different!! So, the main question is: should I transform the dataset (number of sequences) into relative abundances of each OTU for PCoA and NMDS with the abovementioned distance metrics?

I have just read at the FAQs that we should to calculate the relative abundance of each OTU when using Bray Curtis distances,: "for a beta-diversity measure like Bray-Curtis Dissimilarity, you might simply use the relative abundance of each taxa in each sample, as the absolute counts are not appropriate to use directly in the context where count differences are not meaningful." But I do not understand why... I thought that Bray Curtis dissimilarity is based on the number of sequences of each OTU, not on their relative abundance.

My second question is similar. Should I use the relative abundance of each OTU to construct a hierarchical clustering (also based on Bray Curtis or weighted UniFrac distances) to group my samples? And what about PERMANOVA? I have again the same question: relative abundances or absolute counts?

Thanks a lot!

joey711 commented 7 years ago

https://www.bioconductor.org/packages/release/bioc/vignettes/phyloseq/inst/doc/phyloseq-FAQ.html#i-need-help-analyzing-my-data.-it-has-the-following-study-design