joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
586 stars 187 forks source link

Plotting samples and subset of taxa in one plot #1087

Open LunavdL opened 5 years ago

LunavdL commented 5 years ago

Hi Joey, I have a microbial dataset with taxa (species) and samples and I'm plotting these in a PCoA based on Bray-Curtis. Based on the PERMANOVA, I selected 20 taxa out of >5600 that are most important for the significant differences between treatments. So far I can plot either species or samples, or show these in split graphs, but I would like to make an ordination plot that shows all the samples and the 20 selected taxa in one plot.

I tried to make this work with subset_taxa(), but that also alters the ordination of the samples in the plot, while I would like the plot to be based on the full dataset, and just show the position of a few taxa in addition.

With vegan, I could add this with:

p <- ordipointlabel(pca, display="sites")
with(env, points(pca, display = "sites")
text(pca, display="species", cex = 0.8, col = "darkcyan",select=sel)

where "sel" was the list of names I wanted to show.

Is this also possible with phyloseq? I really like phyloseq and would prefer to be able to do it all with one package!

This is the piece of code I have so far:

ordu <- ordinate(physeq, "PCoA", "bray")
p13 = plot_ordination(physeq, ordu, type="samples", 
                      color="treatment1",
                      shape="steri_treat") 
p13  + geom_point(size=5) + scale_colour_manual(values=c("darkorchid4","dodgerblue4", "firebrick4","chocolate2"))+ 
  scale_shape_manual(values=c(19,21))+ ggtitle("Bacteria (species level) PCoA Bray-Curtis")+
  geom_line()

Cheers, Luna

mikemc commented 5 years ago

Luna, I recommend doing the following, illustrated with the GlobalPatterns example data. First, some setup:

library(phyloseq)
library(dplyr)
library(ggplot2)

data(GlobalPatterns)
ps <- GlobalPatterns
# Make sure to convert to proportions before computing Bray-Curtis dissimilarity
ps.ra <- ps %>%
    transform_sample_counts(function (x) x / sum(x))

We'll add the taxon / OTU label to the tax_table, allowing us to filter by it later on (see here)

tax_table(ps.ra) <- cbind(tax_table(ps.ra), OTU = taxa_names(ps.ra))

and make a list of the taxa we'll want to plot. I'm just going to pick the first 10 taxa for this example.

plot_otus <- taxa_names(ps.ra)[1:10]

From here, there are a couple ways we could go. Sticking closest to what you've done above, we'll make the sample plot:

ordu <- ordinate(ps.ra, "PCoA", "bray")
p <- plot_ordination(ps.ra, ordu, type="samples")

to which we'll add the taxa. We get the dataframe for the "taxa" plot using all taxa, and then filter to just the taxa we want to plot:

taxdf <- plot_ordination(ps.ra, ordu, type="taxa", justDF = TRUE)
taxdf <- taxdf %>%
    filter(OTU %in% plot_otus)

and add these to the samples ordination

p + 
    geom_point(data = taxdf, color = "blue", shape = 3, size = 3)

Alternately, you could use the biplot option,

p <- plot_ordination(ps.ra, ordu, type="biplot")
p$data <- p$data %>%
    filter((id.type == "Samples") | (OTU %in% plot_otus))
p
LunavdL commented 5 years ago

That works perfectly - thank you for your help!

thierryjanssens commented 2 years ago

Hi,

how dou get a label plotted (i.e. OTU or LCA rank) to the plotted points from the subset. when I use

p + geom_point(data = taxdf, color = "black", shape = 1)+geom_text(data=taxdf, label=taxdf$LCA) Error in FUN(X[[i]], ...) : object 'Treatment' not found

It seems to interfere with the main phyloseq ordination plot in which Treatment has been used in a label.

Thank you in advance.

Kind regards,

T.