joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

Tax_glom not aggregating #1303

Open Mrudhulaks opened 4 years ago

Mrudhulaks commented 4 years ago

Hi,

I am trying to apply tax_glom function on my phyloseq object. But looks like it is not working.

The dimensions of the file I am working on is 768 OTU and 201 samples

The code I used: OTU.Phylum = tax_glom(physeq, taxrank = "Phylum", NArm = FALSE).

The dimensions after performing this function is same as before.

So, I tried the code at different tax level.

OTU.genus = tax_glom(physeq, "Genus")

However, the results from this is also the same.

Could you please help me resolve this.

Thank you.

Mathildebd commented 3 years ago

Did you solve this? I was facing the same problem - i.e. that tax_glom() was not "merging"/aggregating taxa (as it says in the documentation it should), but rather it is just subsetting my data. I.e. if I applied gen <- tax_glom(ps16, taxrank="genus")I would get a dataset containing only ASVs which have been assigned to genera, but not a summed data table.

I also tried with aggregate_taxa() from the "microbiome" R package.

By running taxa_names(gen)[1:2] I identified the problem since it would return this:

TAACACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGTGGTTTGCTACGTCCGCTGTGAAAACCTAGGGCTTAACCCTGGGCTTGCAGTGGATACGGACAGACTAGAGGTAGGTAGGGGAGAATGGAATTCCCGGTGTAGCGGTGAAATGCGCAGATATCGGGAGGAACACCAGTGGCGAAGGCGGTTACCTGGTCCTGCACTGACGCTGATGCACGAAAGCTGGGGGAGCAAACGGGATTd:Bacteria(1.0000),p:Actinobacteria(0.9500),c:Actinobacteria(0.9025),o:Actinomycetales(0.8484),f:Thermomonosporaceae(0.5514),g:Actinoallomurus(0.2316)+_Bacteria_Actinobacteria_Actinobacteria_Actinomycetales__
""
TAACACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGTGGTTTGCTACGTCCGCTGTGAAAACCTAGGGCTTAACCCTGGGCTTGCAGTGGATACGGACAGACTAGAGGTAGGTAGGGGAGAATGGAATTCCCGGTGTAGCGGTGAAATGCGCAGATATCGGGAGGAACACCGGTGGCGAAGGCGGTTCTCTGGGCCTTACCTGACACTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTd:Bacteria(1.0000),p:Actinobacteria(1.0000),c:Actinobacteria(1.0000),o:Actinomycetales(1.0000),f:Thermomonosporaceae(0.8700),g:Actinoallomurus(0.6960)+_Bacteria_Actinobacteria_Actinobacteria_ActinomycetalesThermomonosporaceae ""

This is because when putting together my phyloseq object I saved the two first colums of my taxonomic assignment - which is the sequence itself and the "match-score-string", i.e. a structure like this:

seq 1 TAACACCGGCAGCTCAAGTGGTGGCCATTATTATTGGGCCTAAAGCGTTCGTAGCCGGTTTGATAAGTCTCTGGTGAAATCCCGCAGCTTAACTGTGGGACTTGCTGGAGATACTATTAGACTTGAGGTCGGGAGAGGTTAGGGGTACTCCCAGGGTAGGGGTGAAATCCTATAATCCTGGGAGGACCACCTGTGGCGAAGGCGCCTAACTGGAACGAACCTGACGGTGAGTAACGAAAGCCAGGGGCGCGAACCGGATT 2 TAATACCTGCAGCCCAAGTGGTGGTCGATTTTATTGAGTCTAAAACGTTCGTAGCCGGTCTGATAAATCCTTGGGTAAATCGGAAAGCTTAACTTTCCGAATTCCGAGGAGACTGTCAGACTTGGGACCGGGAGAGGCTAGAGGTACTTCTGGGGTAGGGGTAAAATCCTGTAATCCTAGAAGGACCACCGGTGGCGAAGGCGTCTAGCTAGAACGGATCCGACGGTGAGGGACGAAGCCCTGGGTCGCAAACGGGATT string 1 d:Archaea(1.0000),p:"Euryarchaeota"(1.0000),c:Methanobacteria(1.0000),o:Methanobacteriales(1.0000),f:Methanobacteriaceae(1.0000),g:Methanobacterium(1.0000) 2 d:Archaea(1.0000),p:"Euryarchaeota"(1.0000),c:Thermoplasmata(0.9900),o:Methanomassiliicoccales(0.9801),f:Methanomassiliicoccaceae(0.9703),g:Methanomassiliicoccus(0.9606) sep domian phyla class order family genus 1 + Archaea Euryarchaeota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobacterium 2 + Archaea Euryarchaeota Thermoplasmata Methanomassiliicoccales Methanomassiliicoccaceae Methanomassiliicoccus

By deleting these first columns and thus keeping only those with proper trimmed taxonomy, I now get taxa_names(gen)[5:10]

taxa_names(gen)[5:10] [1] "Methanomassiliicoccus"
[2] "Archaea_Thaumarchaeota__"
[3] "Archaea_Thaumarchaeota_Thaumarchaeota_o:Nitrososphaerales
"
[4] "Archaea_Thaumarchaeota_Thaumarchaeota_o:Nitrososphaeralesf:Nitrososphaeraceae" [5] "g:Nitrososphaera"
[6] "Bacteria_____"

So the take-home-message is that the fuction tax_glom() (and aggregate_taxa() ) is aggregating across the whole _taxtable() and before each line would be unique, thus no aggregation was performed.

I hope this explanation makes sense!