joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Plot heatmap for Sample-Sample Distance #241

Closed zachcp closed 11 years ago

zachcp commented 11 years ago

HI Joey711,

One feature that I would like to see would be a way to visualize sample distances as a heatmap. Sort of like the plot_heatmap function only with a way to visualize the distance matrix. The difference between a straight plotting of the distance matrix, however, is that this function would use a clustering algorithm to place similar samples adjacent to one another.

In my case I have a lot of samples with a lot of OTUs. Trying to plot a heatmap of all the data will send my computer to a halt. Calculating a distance function is pretty easy, though, so I could use this function to quickly find similarity between samples. I suppose this would be a heatmap version of the plot_network method.

Thanks, zach cp

joey711 commented 11 years ago

I'm not sure I will add an official phyloseq plot function for this. I think it might be one or two ggplot2 commands. I'll mock up an example and let you know.

zachcp commented 11 years ago

Thanks Joey711,

I ended up using the phyloseq wrapper for vegan to return a dist object and used a heatmap/Neatmap to plot it. Not too tricky and maybe cluttering the core of phyloseq. Thanks for your input.

zach cp

joey711 commented 11 years ago

K. I'll close this issue after one of us posts some example code, in case someone else is searching for the same functionality.

Thanks for the feedback, and best

zachcp commented 11 years ago

OK. The basics are below. The distance object supplied by the distance function can also be passed to NeatMap for different type of heatmap arrangements.

    #get a distance with the Phyloseq distance wrapper function
    d = distance(phlylobject, method='jaccard' )  #try distance('list')
    #cluster the distance tree    
    hc = hclust(d, method='euclidean' )
    #plot the hclust object
    plot(hc)
joey711 commented 11 years ago

Thanks for posting the code, @zachcp ! It will hopefully help someone else needing the same options. And always nice to have code contributions from others via GitHub. It definitely enriches the documentation for phyloseq and for interacting with R in general. I'm never going to think of everything helpful to document on my own, so contributing examples like this is invaluable.

Here is a slight modification to your code to make it fully reproducible for others, and with the figure output:

Load package and data

library("phyloseq")
library("ggplot2")
data("GlobalPatterns")

Now the distance calculation, hierarchical clustering, and plot just the dendrogram with base R graphics.

d = distance(GlobalPatterns, method='bray')
plot(hclust(d, method="ward"))

sample-heatmap1

And now the heatmap with base R graphcis

heatmap(as.matrix(d))

sample-heatmap2