tax_glom then psmelt centered-log-ratio transformed data?

joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:

584 stars 187 forks source link

I wanted to plot abundances using centered log ratio, so I did the transformation. Then, I needed to agglomerate my taxa by some specific taxonomic level, e.g. Phylum, and after that I psmelt-ed the resulting table to plot.

The question I have: is it correct to perform tax_glom() and psmelt() on centered log ratio-ed data? Are these re-organisations of the dataset affecting the counts, somehow? I feel it might not be a good idea but I am not sure (I do that with relative abundances, but then these are not log-transformed data).

The only two piece of info i found are this post and this FAQ but they are not useful, for my purpose.

A general idea of how this works:

# aggregate
phylo_aggregated <- tax_glom(phylo_data, taxrank=taxa_level)

# psmelt
psmelted <- psmelt(phylo_aggregated)

# plot
psmelted[psmelted$TimePoint==timepoint, ] %>% # subset by timepoint
    filter(Abundance >= threshold | Abundance <= -threshold) %>% # subset by abundance
        mutate(Phylum = fct_reorder(Phylum, Abundance, .desc=T)) %>% # reorder by abundance
        ggplot(aes(x=Treatment, y=Abundance, fill=Phylum)) + # plot
        geom_boxplot() +
        scale_fill_viridis(discrete=T, option="turbo") +
        theme_bw() +
        theme(text=element_text(size=25, face="bold"), legend.position="none", axis.text.x = element_text(angle=0), strip.text=element_text(size=15)) +
        guides(fill=guide_legend(ncol=1)) +
        xlab("") +
        ylab("") +
        facet_wrap(~Phylum) +
        ggtitle(paste0("Taxa composition in ", timep, " at ", taxa_level, " level"))

# aggregate phylum phylo_agglomerated <- tax_glom(phylo_data, taxrank=taxa_level) # create centered log ratio object ... # get names of best n taxa max_taxa <- 6 best_taxa <- names(sort(taxa_sums(phylo_agglomerated), decreasing=TRUE)[1:max_taxa]) # get best max_taxa from the centered log ratio object phylo_thresholded <- prune_taxa(best_taxa, phylo_clr) # psmelt psmelted <- psmelt(phylo_thresholded) # plot psmelted %>% filter(TimePoint==timep) %>% # subset by time point filter(Treatment==a_treat) %>% # filter by treatment group_by(DNA_RNA) %>% # group by treatment mutate(Phylum = fct_reorder(Phylum, Abundance, .desc=T)) %>% # reorder by abundance ggplot(aes(x=DNA_RNA, y=Abundance, fill=Phylum)) + # plot geom_boxplot() + scale_fill_viridis(discrete=T, option="turbo") + theme_bw() + theme(text=element_text(size=25, face="bold"), legend.position="none", axis.text.x = element_text(angle=0), strip.text=element_text(size=15)) + guides(fill=guide_legend(ncol=1)) + xlab("") + ylab("") + facet_wrap(~Phylum) + ggtitle(paste0(taxa_level, " composition comparison between DNA and RNA in ", timep))

joey711 / phyloseq

tax_glom then psmelt centered-log-ratio transformed data? #1636