joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

Inconsistency between average of relative abundance for taxa and average relative abundance following merge by group #793

Open janetw opened 7 years ago

janetw commented 7 years ago

Hello, So my apologies if this is already explained somewhere but I am getting a difference in the average relative abundance for a taxa within a group depending on how I go about calculating the average.

If I follow the tutorials, e.g. the restroom biogeography example, and merge my samples by my group factor, then transform to relative abundance, I get a different result then if I average the relative abundance values within my group factor after calling the transform_sample_counts(). Does this make sense?

I think my issue is because I have varying read depth across my samples. And as such the averages don't match. I'm guessing that if my read depth is the same across all of the samples or if I rarefy to an even depth then this works. Or am I missing something?

I've uploaded a mini example which I think shows the problem Example_merge_samples_to_get_AVG.xlsx

That being the case, we are hoping not to rarefy the data but would like to generate bar plots that show the average by group as done in the restroom biogeography example. Do you have any other suggestions on how to use the phylosesq object to generate a plot by group other than what is described?

Also here is the code that I have been working with but I think it is pretty close to what is in those tutorials except with my variables.

Merge samples by Treatment group, then transform to relative abundance

genus_glom_CM_mergeSamples <- merge_samples(genus_glom_CM, "TRMT")
sample_data(genus_glom_CM_mergeSamples)$TRMT <- levels(sample_data(genus_glom_CM)$TRMT)
genus_glom_CM_mergeSamples <- transform_sample_counts(genus_glom_CM_mergeSamples, function(x) 100 * x/sum(x))

Put back in the levels

sample_data(genus_glom_CM_mergeSamples)$TRMT <- factor(sample_data(genus_glom_CM_mergeSamples)$TRMT, levels = c("Control", "125", "250"))

Plot top 30

top30_genus_glom_CM_mergeSamples_names <- names(sort(taxa_sums(genus_glom_CM_mergeSamples), decreasing=TRUE))[1:30]

genus_glom_CM_mergeSamples.top30 <- prune_taxa(top30_genus_glom_CM_mergeSamples_names, genus_glom_CM_mergeSamples)

Plot using custom color palette

plot_bar(genus_glom_CM_mergeSamples.top30, x="TRMT", fill="Genus") + scale_fill_manual(values=custom_col30_C)

Facet_wrap by Genus

plot_bar(genus_glom_CM_mergeSamples.top30, x="TRMT", fill="Genus") + scale_fill_manual(values=custom_col30_C) + facet_wrap(~Genus)

Thanks in advance! Best regards, Janet

joey711 commented 6 years ago

https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example