joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

Waste Not, Want Not... Proper way to perform many pair-wise comparisons? #330

Closed mikerobeson closed 10 years ago

mikerobeson commented 10 years ago

Hi Joey,

I am trying to compare microbial communities across ~20 plant genotypes. I would like to compare the differences in OTUs for each pair-wise genotype comparison. Below are two ways in which I was thinking of doing the analysis. Each approach returns quite different results in terms of the type and number of significantly different OTUs for each pair-wise comparison. Many of the commands are based on the tutorials I've come across...

1st approach

Process the data in the phyloseq object my.data:

all.geno.deseq2 <- phyloseq_to_deseq2(my.data, ~Genotype) all.geno.deseq2 <- DESeq(all.geno.deseq2)

Iterate through and analyze each pair of genotypes via the contrast option of results.

curr.pair.res <-results(all.geno.deseq2,contrast=list("Genotype.102","Genotype.103")) curr.pair.res <- curr.pair.res[order(curr.pair.res$padj, na.last = NA), ] alpha = 0.01 sigtab = curr.pair.res[(curr.pair.res$padj < alpha), ]

sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(my.data)[rownames(sigtab)], "matrix"))

head(sigtab)

Repeat the above, via the contrast option, for all other comparisons. e.g.: Genotype.102 vs Genotype.104 Genotype.102 vs Genotype.105 ....

2nd approach

I iterate through my phyloseq object my.data, by pruning away all the data except for the pair of genotypes I wish to compare. Then run phyloseq_to_deseq2, only on that data. Then grab results, etc:

curr.pair <- subset_samples(my.data, Genotype %in% c("Genotype.102","Genotype.103")) curr.pair.empty.otus.removed <- prune_taxa(taxa_sums(curr.pair) > 0, curr.pair) geno.deseq2 <- phyloseq_to_deseq2(curr.pair.empty.otus.removed, ~Genotype) geno.deseq2 <- DESeq(geno.deseq2)

res = results(geno.deseq2) res = res[order(res$padj, na.last = NA), ] alpha = 0.01 sigtab = res[(res$padj < alpha), ] sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(curr.pair.empty.otus.removed)[rownames(sigtab)], "matrix")) head(sigtab)

Again, repeat the above for all other comparisons. e.g.: Genotype.102 vs Genotype.104 Genotype.102 vs Genotype.105 ....

Any suggestions as to which of these two approaches, or perhaps another better approach, is most appropriate? Do, I use the results(... contrast=...) option for my pairwise comparisons (procedure 1), or do I treat each pair-wise comparison as a separate isolated analysis (procedure 2)?

In a nutshell, I want to determine how particular OTUs are enriched or depauperate for each pair-wise plant genotype comparison. Hopefully, I made some sense. :-)

Thanks for the great work on phyloseq! -M

joey711 commented 10 years ago

Quick comment (and I don't know the details of your experiment or goals), but, both of your proposals have a large number of multiple hypotheses for which you must account. For instance, corrected p-values should be included in the results of each individual plant genotype pair, but your results will not be honest unless you also account for the fact that you're performing all 190 or so pairwise comparisons.

There are ways to do this, but nothing "off the shelf" within phyloseq will do exactly what you're asking.

That's my first concern, and addresses the question about phyloseq. Please migrate further discussion about the specifics of doing this with respect to the waste-not article and experimental design to:

https://github.com/joey711/waste-not-supplemental/issues