joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Obtaining and/or plotting ordination scores other than "scaling 2"/correlation biplots #816

Closed JacobRPrice closed 7 years ago

JacobRPrice commented 7 years ago

Hi Everyone,

Looking through the documentation and functions for ordinate() and plot_ordination() and I've noticed that there is no option to designate what "scaling" of scores is returned by those functions. My understanding is that the default behavior of vegan is to return "scaling type 2" scores when prompted, and that this type of scaling is typically used to interrogate the relationships between species while approximating distances/structures amongst sites/samples.

  1. Did I miss something in the documentation that allows an argument along the lines of "scaling=2" (as in plot.cca() within vegan)? I was thinking it could be automatically selected based upon the "type" argument in plot_ordination() but was unable to locate that functionality.
  2. If not, is there rationale that I'm unaware of for this omission (which could be entirely possible)?

Thank you for any insight you guys would be able to provide.

Jake

shandley commented 7 years ago

You can always convert the PhyloSeq object to a table as so:

ps0.table <- as.data.frame(otu_table(ps0))

and if you want the sample data you can:

sd <- as.data.frame(sample_data(ps0))

Then you can run them using traditional vegan code.

JacobRPrice commented 7 years ago

Hi @shandley Thank you for your response. I'm familiar with moving phyloseq data to vegan and back again because I prefer using vegan's modeling and testing features. The visual functionality of phyloseq (especially in terms of ease of use) is very attractive though, and I was hoping to avoid rewriting the plot_ordination() function to suit my purposes. I thought that perhaps I was unaware of how the scores were scaled within the function or, even worse, feared that my understanding of scaling was somehow incorrect.

Do you think that rewriting the function is the way to go?

joey711 commented 7 years ago

@JacobRPrice I don't think there's anything to re-write. Any additional coordinate features in the ordination can be extracted, and added as additional layers to the ggplot object produced by plot_ordination.

Michelle Berry has posted a nice example of this general approach (but for CAP):

http://deneflab.github.io/MicrobeMiseq/demos/mothur_2_phyloseq.html#constrained_ordinations

JacobRPrice commented 7 years ago

@joey711 Thank you for sending the link over. I've taken a look at the capscale section, and it appears that her approach is still pulling the "scale 2" version of the site scores over.

Here's a reproducible example to hopefully clarify:

library(phyloseq)
library(ggplot2)
data("GlobalPatterns")
GP = GlobalPatterns
wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A=0.5*nsamples(GP))
GP1 = prune_taxa(wh0, GP)
GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x))
phylum.sum = tapply(taxa_sums(GP1), tax_table(GP1)[, "Phylum"], sum, na.rm=TRUE)
top5phyla = names(sort(phylum.sum, TRUE))[1:5]
GP1 = prune_taxa((tax_table(GP1)[, "Phylum"] %in% top5phyla), GP1)
human = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
sample_data(GP1)$human <- factor(human)
gp.rda<-ordinate(GP1, "CCA")
plot_ordination(GP1, gp.rda, type="samples", color="SampleType", shape="human")
plot_ordination(GP1, gp.rda, type="species", color="Phylum")
pt<-plot_ordination(GP1, gp.rda, type="samples", color="SampleType", shape="human")

We can extract the scores and compare them:

cbind(vegan::scores(gp.rda, display="sites", scaling=1), 
vegan::scores(gp.rda, display="sites", scaling=2),
"ptCA1"=pt$data$CA1,"ptCA2"=pt$data$CA2)

which yields:

                 CA1         CA2         CA1         CA2       ptCA1
CL3      -0.11729885  0.02169781 -0.12137190  0.02264996 -0.12137190
CC1       0.07266212  0.08099170  0.07518522  0.08454582  0.07518522
SV1      -0.01304275  0.04347983 -0.01349564  0.04538784 -0.01349564
M31Fcsw  -1.46011447 -0.17210270 -1.51081505 -0.17965502 -1.51081505
M11Fcsw  -1.45704941 -0.16702408 -1.50764356 -0.17435352 -1.50764356
M31Plmr   0.97892381 -1.51217987  1.01291567 -1.57853823  1.01291567
M11Plmr   0.68883485 -1.14857677  0.71275375 -1.19897928  0.71275375
F21Plmr   0.89168336 -1.37823562  0.92264590 -1.43871616  0.92264590
M31Tong   1.05051708 -1.53363957  1.08699492 -1.60093964  1.08699492
M11Tong   0.97475981 -1.23774601  1.00860709 -1.29206151  1.00860709
LMEpi24M  0.97088317 -0.84987576  1.00459583 -0.88717051  1.00459583
SLEpi20M  1.05746424 -0.64827270  1.09418332 -0.67672058  1.09418332
AQC1cm    0.71646515  1.04095150  0.74134348  1.08663114  0.74134348
AQC4cm    0.74253294  1.08631584  0.76831643  1.13398618  0.76831643
AQC7cm    0.74124591  1.04771758  0.76698471  1.09369413  0.76698471
NP2       0.77635532  1.76459068  0.80331325  1.84202548  0.80331325
NP3       0.77077049  1.76006757  0.79753450  1.83730388  0.79753450
NP5       0.74069577  1.66058108  0.76641547  1.73345167  0.76641547
TRRsed1  -0.27891678  0.26982318 -0.28860181  0.28166371 -0.28860181
TRRsed2  -0.41255547  0.09095019 -0.42688092  0.09494132 -0.42688092
TRRsed3  -0.63901982  0.09832849 -0.66120896  0.10264340 -0.66120896
TS28     -1.43082129 -0.16560719 -1.48050471 -0.17287446 -1.48050471
TS29     -1.44586350 -0.15999812 -1.49606924 -0.16701925 -1.49606924
Even1    -1.17712661 -0.09950037 -1.21800081 -0.10386671 -1.21800081
Even2    -1.17849951 -0.10373667 -1.21942138 -0.10828890 -1.21942138
Even3    -1.18031364 -0.10417154 -1.22129850 -0.10874286 -1.22129850
               ptCA2
CL3       0.02264996
CC1       0.08454582
SV1       0.04538784
M31Fcsw  -0.17965502
M11Fcsw  -0.17435352
M31Plmr  -1.57853823
M11Plmr  -1.19897928
F21Plmr  -1.43871616
M31Tong  -1.60093964
M11Tong  -1.29206151
LMEpi24M -0.88717051
SLEpi20M -0.67672058
AQC1cm    1.08663114
AQC4cm    1.13398618
AQC7cm    1.09369413
NP2       1.84202548
NP3       1.83730388
NP5       1.73345167
TRRsed1   0.28166371
TRRsed2   0.09494132
TRRsed3   0.10264340
TS28     -0.17287446
TS29     -0.16701925
Even1    -0.10386671
Even2    -0.10828890
Even3    -0.10874286

The values in the first two columns are the site scores with scale=1, and the values in the second two columns (which are the one's being plotted) are the site scores with scale=2. The last two columns are the values actually being used in the plot (by plot_ordination()).

I'm certainly willing to defer to both of your experience, but I wasn't sure if this behavior was intended.

joey711 commented 7 years ago

To answer your OP, I just double-checked and phyloseq is passing along to a scores() generic, using the default value for scaling argument. So what you are asking is really a vegan question, rather than a phyloseq question. I agree their doc for scores.cca is a little confusing on the matter of scaling parameter, but you can check for yourself by adding the following to the end of your example

all.equal(check.attributes = FALSE,
  target = vegan::scores(gp.rda, display="sites"),
  current = cbind(ptCA1=pt$data$CA1, ptCA2=pt$data$CA2)
)
# returns TRUE
JacobRPrice commented 7 years ago

Thanks for the confirmation @joey711 !