joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

Tax glom not operating on phyloseq object #1619

Open paddyhooper opened 1 year ago

paddyhooper commented 1 year ago

Hi,

I am hoping to agglomerate my phyloseq object by phylum. I have done this with other phyloseq objects and had no issues but when I try agglomerate based on "Phylum" taxonomic rank it returns the same number of ASVs as unagglomerated

My phyloseq object is made from an ASV count table, a taxonomy table, and metadata file.

phyloseq-class experiment-level object otu_table() OTU Table: [ 15250 taxa and 32 samples ] sample_data() Sample Data: [ 32 samples by 21 sample variables ] tax_table() Taxonomy Table: [ 15250 taxa by 8 taxonomic ranks ]

unique(tax_table(ps_16S_Ave)[,"Phylum"])

Output: Taxonomy Table: [49 taxa by 1 taxonomic ranks]: Phylum
ASV1 "Firmicutes"
ASV2 "Proteobacteria"
ASV3 "Cyanobacteria"
ASV7 "Actinobacteriota"
ASV13 "Verrucomicrobiota"
ASV14 "Gemmatimonadota"
ASV22 "Bacteroidota"
ASV42 "Chloroflexi"
ASV78 "Myxococcota"
ASV131 "Planctomycetota" ASV162 "Acidobacteriota"
ASV171 "Euryarchaeota"
ASV214 "Desulfobacterota"
ASV301 "MBNT15"
ASV475 "Halobacterota"
ASV537 "Spirochaetota"
ASV700 "Patescibacteria"
ASV850 "NB1-j"
ASV1049 "Cloacimonadota"
ASV1055 "Dependentiae"
ASV1098 "Nitrospinota"
ASV1123 "Campylobacterota"
ASV1173 "Sumerlaeota"
ASV1185 "SAR324_clade(Marine_group_B)" ASV1200 "Thermoplasmatota"
ASV1263 "Armatimonadota"
ASV1347 "Bdellovibrionota"
ASV1687 "WPS-2"
ASV1855 "Nitrospirota"
ASV1957 "Deinococcota"
ASV2009 "WS4"
ASV3031 "Zixibacteria"
ASV3283 "Latescibacterota"
ASV3379 "Methylomirabilota"
ASV3682 "Sva0485"
ASV4215 "Fibrobacterota"
ASV4693 "Nanoarchaeota"
ASV4976 "Caldisericota"
ASV5020 "FCPU426"
ASV5375 "Hydrogenedentes"
ASV5601 "Crenarchaeota"
ASV5890 "Abditibacteriota"
ASV8621 "Deferrisomatota"
ASV8692 "WS1"
ASV9982 "Elusimicrobiota" ASV10597 "Firestonebacteria"
ASV11903 "WS2"
ASV12037 "Margulisbacteria"
ASV15732 "Micrarchaeota"

From this I presume, my tax_glom should produce a phyloseq object with 49 taxa (one per phyla) but after a long processing time it returns:

ps_16S_phylum <- tax_glom(ps_16S_Ave, "Phylum") ps_16S_phylum

phyloseq-class experiment-level object otu_table() OTU Table: [ 15250 taxa and 32 samples ] sample_data() Sample Data: [ 32 samples by 21 sample variables ] tax_table() Taxonomy Table: [ 15250 taxa by 8 taxonomic ranks ]

Any ideas why it would not be agglomerating when it can recognise the phyla in my phyloseq object?

Many thanks in advance! Paddy

gmteunisse commented 1 year ago

Hi Paddy, could you post a reproducible example? For example, does tax_glom also fail when using the GlobalPatterns dataset?

PS I see that you left the NArm to its default of TRUE in tax_glom - is this the behaviour that you want?

paddyhooper commented 1 year ago

Hi there, thanks very much for your quick response!

I have repeated my code with the GlobalPatterns Dataset, and get the expected result below:

data(GlobalPatterns) GP = GlobalPatterns GP

> GP

phyloseq-class experiment-level object

otu_table() OTU Table: [ 19216 taxa and 26 samples ]

sample_data() Sample Data: [ 26 samples by 7 sample variables ]

tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]

phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]

unique(tax_table(GP)[,"Phylum"] )

Abbreviated output:

Taxonomy Table: [67 taxa by 1 taxonomic ranks]:

Phylum

549322 "Crenarchaeota"

173903 "Euryarchaeota"

368907 "Actinobacteria"

51888 "Spirochaetes"

591769 "MVP-15"

29529 "Proteobacteria"

236160 "SBR1093"

11293 "Fusobacteria"

113012 "Tenericutes"

146168 "ZB3"

167052 "Cyanobacteria"

95866 "GOUTA4"

522127 "TG3"

255842 "Chlorobi"

554668 "Bacteroidetes"

etc.

GP_phylum <- tax_glom(GP, "Phylum", NArm = FALSE) #this time i did the NArm GP_phylum

> GP_phylum

phyloseq-class experiment-level object

otu_table() OTU Table: [ 67 taxa and 26 samples ]

sample_data() Sample Data: [ 26 samples by 7 sample variables ]

tax_table() Taxonomy Table: [ 66 taxa by 7 taxonomic ranks ]

phy_tree() Phylogenetic Tree: [ 66 tips and 65 internal nodes ]

I tried rerunning code with 'naRM = FALSE' and got the same failed result:

Agglomerate your phyloseq object by phylum, NArm = FALSE (do not remove ASVs unassigned at phylum level)

ps_16S_phylum <- tax_glom(ps_16S_Ave_tree, "Phylum", NArm = FALSE) ps_16S_phylum

Output

phyloseq-class experiment-level object

otu_table() OTU Table: [ 15250 taxa and 32 samples ]

sample_data() Sample Data: [ 32 samples by 21 sample variables ]

tax_table() Taxonomy Table: [ 15250 taxa by 8 taxonomic rank

This makes me think it must be an issue with the structure of my phyloseq object? Is there a good way to look for errors in my data structure ?

So far I have wondered if: 1) perhaps I have an issue in the formatting of the .txt files I used to make the datasheet, but i can't see an issue as I am able to plot my ASVs by phylum using the phyloseq command plot_bar(ps_16S_phylum, x="Phylum") 2) I tried attaching a tree object to my phyloseq object, but this didn't make a difference, and I don't believe is required for tax_glom?

Many thanks again for your help! Paddy

gmteunisse commented 1 year ago

I think you're right that it must somehow be related to your phyloseq object, but it's hard to figure out what it is without having access to it. You're right that tax_glom doesn't require a tree object.

Judging by the code, tax_glom does the following interally:

  1. Find the index CN of the column in the tax table that corresponds to the given column.
  2. Remove anything that needs to be removed when NArm = T.
  3. Join the names of columns in tax table 1:CN using ";_;", while discarding any lower taxonomy. This results in a taxonomy name.
  4. Create a list where each name corresponds to a unique taxonomy name, and the list items corresponds to the ASVs with that name.
  5. Merge the taxa in each list using merge_taxa.

It seems that your object runs into an issue at step 4: somehow, each list item contains only a single ASV. Can you check this by running the following chunk of code?

physeq <- <your_physeq_object>
taxrank <- "Phylum"
CN <- which(rank_names(physeq) %in% taxrank[1])
tax <- as(access(physeq, "tax_table"), "matrix")[, 1:CN, 
    drop = FALSE]
  tax <- apply(tax, 1, function(i) {
    paste(i, sep = ";_;", collapse = ";_;")
  })
spCliques <- tapply(names(tax), factor(tax), list)
spCliques
paddyhooper commented 1 year ago

Hi, sorry not too sure i follow what the output is meant to give me but the output of spCliques is a long output of asv IDs

The first 10 are:

spCliques[1:10] $ASV1;_;Bacteria;_;Firmicutes [1] "ASV1"

$ASV10;_;Bacteria;_;Proteobacteria [1] "ASV10" $ASV100;_;Bacteria;_;Bacteroidota [1] "ASV100"

$ASV10000;_;Bacteria;_;Abditibacteriota [1] "ASV10000"

$ASV10001;_;Bacteria;_;Chloroflexi [1] "ASV10001"

$ASV10002;_;Bacteria;_;Chloroflexi [1] "ASV10002" $ASV10003;_;Bacteria;_;Verrucomicrobiota [1] "ASV10003"

$ASV10004;_;Bacteria;_;Planctomycetota [1] "ASV10004"

$ASV10005;_;Bacteria;_;Verrucomicrobiota [1] "ASV10005"

$ASV10006;_;Bacteria;_;Proteobacteria [1] "ASV10006"

gmteunisse commented 1 year ago

Thank you. The issue seems to be that in your tax table, there must be a column on index 1 with the ASV names. This is why each ASV is put into its own group. Remove that column (or move it to the end of your taxonomy table) and then your issue should be resolved. Let me know if that works for you.


From: Paddy Hooper @.> Sent: Thursday, September 1, 2022 5:22:27 PM To: joey711/phyloseq @.> Cc: gmteunisse @.>; Comment @.> Subject: Re: [joey711/phyloseq] Tax glom not operating on phyloseq object (Issue #1619)

Hi, sorry not too sure i follow what the output is meant to give me but the output of spCliques is a long output of asv IDs

The first 10 are:

spCliques[1:10] $ASV1;;Bacteria;;Firmicutes [1] "ASV1"

$ASV10;;Bacteria;;Proteobacteria [1] "ASV10" $ASV100;;Bacteria;;Bacteroidota [1] "ASV100"

$ASV10000;;Bacteria;;Abditibacteriota [1] "ASV10000"

$ASV10001;;Bacteria;;Chloroflexi [1] "ASV10001"

$ASV10002;;Bacteria;;Chloroflexi [1] "ASV10002" $ASV10003;;Bacteria;;Verrucomicrobiota [1] "ASV10003"

$ASV10004;;Bacteria;;Planctomycetota [1] "ASV10004"

$ASV10005;;Bacteria;;Verrucomicrobiota [1] "ASV10005"

$ASV10006;;Bacteria;;Proteobacteria [1] "ASV10006"

— Reply to this email directly, view it on GitHubhttps://github.com/joey711/phyloseq/issues/1619#issuecomment-1234432177, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHBWS3SXR2VWD7IPXUMGG5TV4DC3HANCNFSM6AAAAAAQCIGHB4. You are receiving this because you commented.Message ID: @.***>

paddyhooper commented 1 year ago

Bingo! I am sorry for not noticing sooner that I had a column with the ASVIDs within my taxonomy table! I removed this column and re-ran as below and it worked perfectly.

checking glom

ps_16S_ave rank_names(ps_16S_ave) ps_16S_ave_phylum <- tax_glom(ps_16S_ave, "Phylum", NArm = FALSE) ps_16S_ave_phylum

phyloseq-class experiment-level object

otu_table() OTU Table: [ 49 taxa and 32 samples ]

sample_data() Sample Data: [ 32 samples by 21 sample variables ]

tax_table() Taxonomy Table: [ 49 taxa by 7 taxonomic ranks ]

Thank you so much for your help! A good lesson in double checking my tables before pushing them all into phyloseq!

All best wishes, Paddy

gmteunisse commented 1 year ago

No worries, glad we were able to resolve it!