joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Doubled abundance value after tax_glom #1689

Open bennend opened 12 months ago

bennend commented 12 months ago

Hi,

I tried to use tax_glom to extract the genus level phyloseq of a complete phyloseq object. And I accidently found that the abundances of the genus after tax_glom were doubled. Would you know why does that happen?

my original phyloseq object phyloseq_BASIC is:

phyloseq_BASIC

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1359 taxa and 832 samples ]
sample_data() Sample Data:       [ 832 samples by 66 sample variables ]
tax_table()   Taxonomy Table:    [ 1359 taxa by 7 taxonomic ranks ]

tax_table of the phyloseq_BASIC objects looks like (only showing the first few rows/cols):

ID | Kingdom | Phylum | Class | Order | Family | Genus | Species
k__Bacteria | Bacteria | NA | NA | NA | NA | NA | NA
p__Proteobacteria | Bacteria | Proteobacteria | NA | NA | NA | NA | NA
c__Gammaproteobacteria | Bacteria | Proteobacteria | Gammaproteobacteria | NA | NA | NA | NA
o__Enterobacterales  | Bacteria | Proteobacteria | Gammaproteobacteria | Enterobacterales | NA | NA | NA
f__Enterobacteriaceae | Bacteria | Proteobacteria | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | NA | NA
g__Escherichia | Bacteria | Proteobacteria | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | NA

Before tax_glom, the abundance of g__Escherichia looks like (only showing the first few rows/cols):

otu_table(phyloseq_BASIC) %>% .['g__Escherichia', ]
ID | 20000012582_E100013748 | 20000012544_E100009636 | 20000012476_E100009636
g__Escherichia | 0.03826 | 0.04254 | 0.01882

After tax_glom, the phyloseq object is phylum_phyloseq_BASIC:

phylum_phyloseq_BASIC <- phyloseq_BASIC %>% tax_glom(., taxrank = 'Genus')
phylum_phyloseq_BASIC

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 273 taxa and 832 samples ]
sample_data() Sample Data:       [ 832 samples by 66 sample variables ]
tax_table()   Taxonomy Table:    [ 273 taxa by 7 taxonomic ranks ]

And the abundance after tax_glom looks like:

otu_table(phylum_phyloseq_BASIC) %>% .['g__Escherichia', ]

ID | 20000012582_E100013748 | 20000012544_E100009636 | 20000012476_E100009636
g__Escherichia | 0.07652 | 0.08508 | 0.03764

Any feedback is appreciated! Thanks!

Best, Ben

samd1993 commented 11 months ago

I would compare the relative abundance values which hopefully should be the same before and after...Ive seen wonky things like this too in phyloseq where abundances will double or cut in half after some kind of transformation. Not sure why though..could be something to do with unidentified species being counted in after glomming at genera level since these usually make up 50% of microbiome data?