joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

DPCoA ordination on log transformed sample counts #450

Closed jdpr-zz closed 8 years ago

jdpr-zz commented 9 years ago

Hi Everybody!

I ran into an issue when trying to plot taxa and samples in ordination plots using DPCoA as ordination method.

If the sample counts are not transformed but just normalized (i.e. relative abundance or rarefied) then everything works fine. If however I work on log+1 transformed sample_counts, the samples are plotted in a smaller space compared to the corresponding taxa.

I'm running RStudio Version 0.98.1091 and R Version 3.1.2 (2014-10-31) and Phyloseq ‘1.10.0’

Here is some code on GlobalPatterns to reproduce the issue:

#Data loading & preprocessing as in tutorial
data(GlobalPatterns) 
GP = GlobalPatterns
wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A = 0.5 * nsamples(GP))
GP1 = prune_taxa(wh0, GP)
GP1 = transform_sample_counts(GP1, function(x) 1e+06 * x/sum(x))
phylum.sum = tapply(taxa_sums(GP1), tax_table(GP1)[, "Phylum"], sum, na.rm = TRUE)
top5phyla = names(sort(phylum.sum, TRUE))[1:5]
GP1 = prune_taxa((tax_table(GP1)[, "Phylum"] %in% top5phyla), GP1)
human = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
sample_data(GP1)$human <- factor(human)

#calculate ordination using DPCoA
GP.ord <- ordinate(GP1, "DPCoA")

#split plot with normal dataset
p4 = plot_ordination(GP1, GP.ord, type = "split", color = "Phylum", shape = "human", 
                     label = "SampleType", title = "split")
p4

dpcoa_norm

This works fine and results in two plots, with the points spanning the same area / space.


Now we log+1 transform the sample_counts:

#log transform sample counts
GP1.log <- transform_sample_counts(GP1, function(x) log(x+1))

#calculate ordination of log transformed sample counts
GP.log.ord <- ordinate(GP1.log, "DPCoA")

#split plot with log trandormed sample counts
p4 = plot_ordination(GP1.log, GP.log.ord, type = "split", color = "Phylum", shape = "human", 
                     label = "SampleType", title = "split")
p4

dpcoa_log

This results in the points plotted on different scales.

I can't come up with an Idea how this would make sense. Any ideas or any help would be much appreciated!

Thanks and Cheers jdpr

jdpr-zz commented 9 years ago

Some more thoughts on this issue: The phylogenetic distance is not affected by the log-transformation while the "weighting" is. This is presumably the reason why the "taxa"-plots of both the transformed and the untransformed have the identical dimensions. The exact placement of the taxa however changes due to the log-transformation since the influence of abundant taxa gets smaller. Since DPCoA should then place the samples at the centroid of the respective set of taxa I would expect the placement of the samples on the same scale as the placement of the taxa irrespective of the transformation.