joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Difference in relative abundance between MicrobiomeDB and r phyloseq #1739

Open kbenmd opened 3 months ago

kbenmd commented 3 months ago

Hello,

I'm currently facing a bit of a puzzle in my analysis of microbial communities and was hoping to draw on the collective wisdom of this forum for insights.

In my recent work, I've been using the Phyloseq package in R to analyze 16S rRNA gene sequencing data. Part of my analysis involves calculating the relative abundances of different microbial taxa within my samples. To ensure the accuracy of my findings, I compared my calculated relative abundances with those reported in various MicrobiomeDB databases for similar microbial communities.

Surprisingly, I noticed substantial differences between the relative abundances I calculated using Phyloseq and those listed in MicrobiomeDB databases. These discrepancies are puzzling and potentially significant for the interpretation of my results.

Before delving deeper into troubleshooting and comparisons, I wanted to reach out and ask if anyone here has experienced similar issues or might have insights into potential causes for such differences. Specifically, I'm curious about the following:

Are there known methodological differences between how Phyloseq and MicrobiomeDB calculate relative abundances that could account for these discrepancies?

Here I provide my r code: `Agglomerate taxa at genus level

pseq2 <- aggregate_rare(ps, level = "Genus", detection = 0.0001, prevalence = 50/100)

pseq2 = merge_samples(pseq2, "group") # summed

Calculate relative abundance

pseq2 <- transform(pseq2, "compositional")

Top N taxa

N <- 20

top <- names(sort(taxa_sums(pseq2), decreasing = TRUE))[1:N]

Subset object to top N taxa

pseq2.top <- prune_taxa(top, pseq2)

otumerged2<- otu_table(pseq2.top)`

Here are the results in r:

Screenshot 2024-03-24 at 9 49 13 PM

and here are the MicrobioDB results:

Screenshot 2024-03-24 at 9 48 06 PM

For example, in MicrobiomeDB, Blautia, Bacteroides, and Faecalibacterium are not presented in the plot.

Thank you in advance for your help, and I am looking forward to your responses!