phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
I'm currently facing a bit of a puzzle in my analysis of microbial communities and was hoping to draw on the collective wisdom of this forum for insights.
In my recent work, I've been using the Phyloseq package in R to analyze 16S rRNA gene sequencing data. Part of my analysis involves calculating the relative abundances of different microbial taxa within my samples. To ensure the accuracy of my findings, I compared my calculated relative abundances with those reported in various MicrobiomeDB databases for similar microbial communities.
Surprisingly, I noticed substantial differences between the relative abundances I calculated using Phyloseq and those listed in MicrobiomeDB databases. These discrepancies are puzzling and potentially significant for the interpretation of my results.
Before delving deeper into troubleshooting and comparisons, I wanted to reach out and ask if anyone here has experienced similar issues or might have insights into potential causes for such differences. Specifically, I'm curious about the following:
Are there known methodological differences between how Phyloseq and MicrobiomeDB calculate relative abundances that could account for these discrepancies?
Here I provide my r code:
`Agglomerate taxa at genus level
Hello,
I'm currently facing a bit of a puzzle in my analysis of microbial communities and was hoping to draw on the collective wisdom of this forum for insights.
In my recent work, I've been using the Phyloseq package in R to analyze 16S rRNA gene sequencing data. Part of my analysis involves calculating the relative abundances of different microbial taxa within my samples. To ensure the accuracy of my findings, I compared my calculated relative abundances with those reported in various MicrobiomeDB databases for similar microbial communities.
Surprisingly, I noticed substantial differences between the relative abundances I calculated using Phyloseq and those listed in MicrobiomeDB databases. These discrepancies are puzzling and potentially significant for the interpretation of my results.
Before delving deeper into troubleshooting and comparisons, I wanted to reach out and ask if anyone here has experienced similar issues or might have insights into potential causes for such differences. Specifically, I'm curious about the following:
Are there known methodological differences between how Phyloseq and MicrobiomeDB calculate relative abundances that could account for these discrepancies?
Here I provide my r code: `Agglomerate taxa at genus level
pseq2 <- aggregate_rare(ps, level = "Genus", detection = 0.0001, prevalence = 50/100)
pseq2 = merge_samples(pseq2, "group") # summed
Calculate relative abundance
pseq2 <- transform(pseq2, "compositional")
Top N taxa
N <- 20
top <- names(sort(taxa_sums(pseq2), decreasing = TRUE))[1:N]
Subset object to top N taxa
pseq2.top <- prune_taxa(top, pseq2)
otumerged2<- otu_table(pseq2.top)`
Here are the results in r:
and here are the MicrobioDB results:
For example, in MicrobiomeDB, Blautia, Bacteroides, and Faecalibacterium are not presented in the plot.
Thank you in advance for your help, and I am looking forward to your responses!