joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

Heatmaps for paired data #621

Closed lisawemo closed 8 years ago

lisawemo commented 8 years ago

Hi, I am analyzing 16 microbiome data from the lung and mouth and I'm basically teaching myself qiime and R. So I've finally made it through qiime and have uploaded two files into R. My OTU table using:

otu=import_biom('C:\Users\OneDrive\otu_table.biom')

and my sort of "map" file which just contains SampleID, Location, and Paired. For example: 1_L Lung 1 1_M Mouth 1 2_L Lung 2 2_M Mouth 2

map=read.csv('C:\Users\OneDrive\Mouth vs lung R map_adj.csv',header=T,row.name=1,stringsAsFactors=F)

My data is paired and I want to compare the lung vs the mouth using a heatmap. I've been having trouble with R. First off it won't let me do it by location. I've tried:

plot_heatmap(otu, sample.label="SampleType") And I got this error: Error in get_variable(physeq, sample.label) : Your phyloseq data object does not have a sample-data component Try ?sample_data for more details.

I went to ?sample_data and I have no idea what I am missing here.

Secondly, I'd like to do it by location and paired if that makes sense.
Can anyone help me with this code and maybe explain what it is I'm missing here, because I also have data where I look at the lung microbiome at baseline and then again 2 months later so that also is paired data.

Thank you for your help!

jeffkimbrel commented 8 years ago

Just glancing at this... there are a number of potential issues.

  1. Did you build a phyloseq object from your OTU/map file? It isn't clear that you have the right objects created.
  2. If so, you don't seem to have a column in your map file called "SampleType", which you are trying to call in your plot_heatmap command.
  3. What happens if you try sample_data(xxx) replacing xxx with the name of your phyloseq object? Also try sample_variables(xxx) to see what mapping data you actually have.
lisawemo commented 8 years ago

Hi Jeff, Thank you for your response! Like I had mentioned before I have never used R and this phyloseq package. I did not build a phyloseq object. I tried:

phyloseq(otu2) otu2 is my merged otu_table.biom and my "mapping file". Example of how my mapping file looks: SampleID Location Paired 1_L Lung 1 1_M Mouth 1 2_L Lung 2 2_M Mouth 2

Error: Error in phyloseq(otu2) : Problem with OTU/taxa indices among those you provided. Check using intersect() and taxa_names()

I do not have a SampleType column. Do I just need to replace Location with SampleType?

Here is my whole code:

library("phyloseq") packageVersion("phyloseq")

otu=import_biom('C:\Users\closed-ref-33031010\otu_table.biom')

map=read.csv('C:\Users\Qiime Maps\Mouth vs lung R map_adj.csv',header=T,row.name=1,stringsAsFactors=F)

sampledata1=sample_data(map)

otu2=merge_phyloseq(otu,sampledata1) plot_heatmap(otu2)

I ran:

sample_data(otu2) and the full "mapping" file was there with, part of it: Sample Data: [67 samples by 3 sample variables]: Lung Description Paired 2117-S Lung extra 33 2195-M4 Mouth intra 19 1053-S2 Lung extra 32

sample_variables(otu2) [1] "Lung" "Description" "Paired"

jeffkimbrel commented 8 years ago

To analyze your own data using Phyloseq (the R package), you need to combine all of your data into a phyloseq object. This phyloseq object is really a single object that Phyloseq (the R package) needs to run the analysis. To learn how to create a phyloseq object, type:

library("phyloseq")
?phyloseq

You'll also find to get the most out of Phyloseq (the R package), you'll need additional data such as taxonomic assignment and a tree. Those aren't required, but you're limited without them.

Lastly, before you get too far with your own data, check out the terrific Phyloseq tutorials page using the GlobalPatterns dataset that is included. You'll learn more about how to use Phyloseq by using a properly created phyloseq object. For general R related questions, Stack Overflow is usually the best place to check first.

Good luck!

lisawemo commented 8 years ago

Hi Jeff, I went to ?phyloseq page and it looks like I've done a merge_phyloseq function already with just my OTU table and mapping file. And I've already been to the heatmap Phyloseq tutorial page, but I'm just wondering how to get the SampleType.
Thanks again for your help!

jeffkimbrel commented 8 years ago

SampleType isn't something that exists in your data, because it is simply a name for some data that exists in the example data. For example, when running...

data("GlobalPatterns")
sample_variables(GlobalPatterns)

# and your data
sample_variables(otu2)

...you can see that SampleType exists as metadata in the GlobalPatterns example, but you don't have that variable in your data. It looks like you have a variable called Description, so try that and see what happens:

plot_heatmap(otu2, sample.label="Description")

Also, when you just try plot_heatmap(otu2), do you get a heatmap or an error?

lisawemo commented 8 years ago

I get a heatmap when I run plot_heatmap(otu2) and I got a heatmap by location (mouth and lung) running: plot_heatmap(otu2, sample.label="Lung") So thank you, I'm getting closer! If I want to run it so mouth is on one side and lung is on the either or if I want to run it with the paired data (mouth and lung from the same patient), is there code for that? THANK YOU!

jeffkimbrel commented 8 years ago

You can order by using sample.order="xyz" within the plot_heatmap() function. Check out ?plot_heatmap for more ways to customize the plots.

If you want to sort the order in a way that you can't easily do with sample.order, you can create a vector the of the sample names, which in your case might be the "2117-S, 2195-M4, etc." data by the following:

orderedVector = c("2117-S","2195-M4",etc...)
plot_heatmap(otu2, sample.order = orderedVector)

Note that the names have to be exact, and if you have samples in your phyloseq object that aren't in your list, they will be grouped together under N/A.

lisawemo commented 8 years ago

orderedVector = c("2117-S","2195-M4",etc...) plot_heatmap(otu2, sample.order = orderedVector)

This worked, thank you so much!!

lisawemo commented 8 years ago

Hi Jeff and anyone else, I've been advised to create a heatmap looking at the differences directly for each pair on the log scale so I won't have so many columns. They'd like it to be black for no change and red for a lot of change. Can anyone help me out with how to do that? Thanks!

joey711 commented 8 years ago

To be honest I think you're better off showing this with a scatter plot. It is a more quantitative graphical representation than color scale, and you'll have more options for grouping taxa, adjusting scale, etc.

See my DESeq2 example here:

http://joey711.github.io/phyloseq-extensions/DESeq2.html

The graphic at the bottom is one such custom ggplot2 example. You could modify as you like, but the basic skeleton is there.

I will close for now. I think this addresses the heart of your question, provided your paired test was executed correctly in the first place. You did not show any code for the test itself, unless I missed something.

lisawemo commented 8 years ago

Hi Joey, Thank you so much for getting back to me. Here is the code I ran using:

data=phyloseq_to_deseq2(otu2,~Paired+Lung)

results=DESeq(data,test="Wald",fitType="parametric") res=results(results,cooksCutoff=F)

alpha=0.05 results2=res[which(res$padj<alpha),] results2

And here is the outcome I got: log2 fold change (MAP): Lung Mouth vs Lung Wald test p-value: Lung Mouth vs Lung DataFrame with 20 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj

1105876 10.77406 -3.501271 0.8784393 -3.985786 6.725723e-05 0.0022344348 1541939 36.23552 -2.588955 0.7958687 -3.252992 1.141966e-03 0.0189693171 971907 183.71297 -2.348435 0.7140730 -3.288788 1.006197e-03 0.0176972348 4432430 8.49948 -3.689744 0.9202962 -4.009301 6.089879e-05 0.0022344348 865469 34.78782 -3.869854 0.8429563 -4.590812 4.415246e-06 0.0001885941 ... ... ... ... ... ... ... 1616059 206.238498 -1.225655 0.3223376 -3.802395 1.433041e-04 3.299049e-03 922964 198.965274 -2.576183 0.7423500 -3.470308 5.198609e-04 9.714901e-03 4369035 6.504649 1.780149 0.3188186 5.583580 2.356172e-08 1.761238e-06 4334651 55.578786 -4.495230 0.9363479 -4.800812 1.580232e-06 7.874822e-05 2783638 87.651648 -2.899618 0.9345616 -3.102650 1.917961e-03 3.018266e-02 Which I have interpreted that 10 out of 1500+ OTUs were found to be significantly different. So I am trying to use the website you posted for the scatter plot. Like I have mentioned before, I have never used R before so I'm still trying to get used to it. So for instance, I don't understand this: scale_fill_discrete <- function(palname = "Set1", ...) { scale_fill_brewer(palette = palname, ...) What am I supposed to use for "Set1" and all the "..." In addition, I ran this: x = tapply(results2$log2FoldChange, results2$Rank2, function(x) max(x)) x = sort(x, TRUE) results2$Rank2 = factor(as.character(results2$Rank2), levels=names(x)) and got this error: Error in factor(as.character(results2$Rank2), levels = names(x)) : object 'x' not found So I'm unsure what x is supposed to be here. Thanks again for your help! As I am very lost.