Open paddyhooper opened 1 year ago
Hi Paddy, could you post a reproducible example? For example, does tax_glom also fail when using the GlobalPatterns
dataset?
PS I see that you left the NArm
to its default of TRUE
in tax_glom - is this the behaviour that you want?
Hi there, thanks very much for your quick response!
I have repeated my code with the GlobalPatterns Dataset, and get the expected result below:
data(GlobalPatterns) GP = GlobalPatterns GP
unique(tax_table(GP)[,"Phylum"] )
GP_phylum <- tax_glom(GP, "Phylum", NArm = FALSE) #this time i did the NArm GP_phylum
I tried rerunning code with 'naRM = FALSE' and got the same failed result:
ps_16S_phylum <- tax_glom(ps_16S_Ave_tree, "Phylum", NArm = FALSE) ps_16S_phylum
This makes me think it must be an issue with the structure of my phyloseq object? Is there a good way to look for errors in my data structure ?
So far I have wondered if: 1) perhaps I have an issue in the formatting of the .txt files I used to make the datasheet, but i can't see an issue as I am able to plot my ASVs by phylum using the phyloseq command plot_bar(ps_16S_phylum, x="Phylum") 2) I tried attaching a tree object to my phyloseq object, but this didn't make a difference, and I don't believe is required for tax_glom?
Many thanks again for your help! Paddy
I think you're right that it must somehow be related to your phyloseq object, but it's hard to figure out what it is without having access to it. You're right that tax_glom
doesn't require a tree object.
Judging by the code, tax_glom
does the following interally:
CN
of the column in the tax table that corresponds to the given column.NArm = T
.merge_taxa
.It seems that your object runs into an issue at step 4: somehow, each list item contains only a single ASV. Can you check this by running the following chunk of code?
physeq <- <your_physeq_object>
taxrank <- "Phylum"
CN <- which(rank_names(physeq) %in% taxrank[1])
tax <- as(access(physeq, "tax_table"), "matrix")[, 1:CN,
drop = FALSE]
tax <- apply(tax, 1, function(i) {
paste(i, sep = ";_;", collapse = ";_;")
})
spCliques <- tapply(names(tax), factor(tax), list)
spCliques
Hi, sorry not too sure i follow what the output is meant to give me but the output of spCliques is a long output of asv IDs
The first 10 are:
spCliques[1:10] $
ASV1;_;Bacteria;_;Firmicutes
[1] "ASV1"
$ASV10;_;Bacteria;_;Proteobacteria
[1] "ASV10"
$ASV100;_;Bacteria;_;Bacteroidota
[1] "ASV100"
$ASV10000;_;Bacteria;_;Abditibacteriota
[1] "ASV10000"
$ASV10001;_;Bacteria;_;Chloroflexi
[1] "ASV10001"
$ASV10002;_;Bacteria;_;Chloroflexi
[1] "ASV10002"
$ASV10003;_;Bacteria;_;Verrucomicrobiota
[1] "ASV10003"
$ASV10004;_;Bacteria;_;Planctomycetota
[1] "ASV10004"
$ASV10005;_;Bacteria;_;Verrucomicrobiota
[1] "ASV10005"
$ASV10006;_;Bacteria;_;Proteobacteria
[1] "ASV10006"
Thank you. The issue seems to be that in your tax table, there must be a column on index 1 with the ASV names. This is why each ASV is put into its own group. Remove that column (or move it to the end of your taxonomy table) and then your issue should be resolved. Let me know if that works for you.
From: Paddy Hooper @.> Sent: Thursday, September 1, 2022 5:22:27 PM To: joey711/phyloseq @.> Cc: gmteunisse @.>; Comment @.> Subject: Re: [joey711/phyloseq] Tax glom not operating on phyloseq object (Issue #1619)
Hi, sorry not too sure i follow what the output is meant to give me but the output of spCliques is a long output of asv IDs
The first 10 are:
spCliques[1:10] $ASV1;;Bacteria;;Firmicutes [1] "ASV1"
$ASV10;;Bacteria;;Proteobacteria [1] "ASV10" $ASV100;;Bacteria;;Bacteroidota [1] "ASV100"
$ASV10000;;Bacteria;;Abditibacteriota [1] "ASV10000"
$ASV10001;;Bacteria;;Chloroflexi [1] "ASV10001"
$ASV10002;;Bacteria;;Chloroflexi [1] "ASV10002" $ASV10003;;Bacteria;;Verrucomicrobiota [1] "ASV10003"
$ASV10004;;Bacteria;;Planctomycetota [1] "ASV10004"
$ASV10005;;Bacteria;;Verrucomicrobiota [1] "ASV10005"
$ASV10006;;Bacteria;;Proteobacteria [1] "ASV10006"
— Reply to this email directly, view it on GitHubhttps://github.com/joey711/phyloseq/issues/1619#issuecomment-1234432177, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHBWS3SXR2VWD7IPXUMGG5TV4DC3HANCNFSM6AAAAAAQCIGHB4. You are receiving this because you commented.Message ID: @.***>
Bingo! I am sorry for not noticing sooner that I had a column with the ASVIDs within my taxonomy table! I removed this column and re-ran as below and it worked perfectly.
ps_16S_ave rank_names(ps_16S_ave) ps_16S_ave_phylum <- tax_glom(ps_16S_ave, "Phylum", NArm = FALSE) ps_16S_ave_phylum
Thank you so much for your help! A good lesson in double checking my tables before pushing them all into phyloseq!
All best wishes, Paddy
No worries, glad we were able to resolve it!
Hi,
I am hoping to agglomerate my phyloseq object by phylum. I have done this with other phyloseq objects and had no issues but when I try agglomerate based on "Phylum" taxonomic rank it returns the same number of ASVs as unagglomerated
My phyloseq object is made from an ASV count table, a taxonomy table, and metadata file.
phyloseq-class experiment-level object otu_table() OTU Table: [ 15250 taxa and 32 samples ] sample_data() Sample Data: [ 32 samples by 21 sample variables ] tax_table() Taxonomy Table: [ 15250 taxa by 8 taxonomic ranks ]
unique(tax_table(ps_16S_Ave)[,"Phylum"])
Output: Taxonomy Table: [49 taxa by 1 taxonomic ranks]: Phylum
ASV1 "Firmicutes"
ASV2 "Proteobacteria"
ASV3 "Cyanobacteria"
ASV7 "Actinobacteriota"
ASV13 "Verrucomicrobiota"
ASV14 "Gemmatimonadota"
ASV22 "Bacteroidota"
ASV42 "Chloroflexi"
ASV78 "Myxococcota"
ASV131 "Planctomycetota" ASV162 "Acidobacteriota"
ASV171 "Euryarchaeota"
ASV214 "Desulfobacterota"
ASV301 "MBNT15"
ASV475 "Halobacterota"
ASV537 "Spirochaetota"
ASV700 "Patescibacteria"
ASV850 "NB1-j"
ASV1049 "Cloacimonadota"
ASV1055 "Dependentiae"
ASV1098 "Nitrospinota"
ASV1123 "Campylobacterota"
ASV1173 "Sumerlaeota"
ASV1185 "SAR324_clade(Marine_group_B)" ASV1200 "Thermoplasmatota"
ASV1263 "Armatimonadota"
ASV1347 "Bdellovibrionota"
ASV1687 "WPS-2"
ASV1855 "Nitrospirota"
ASV1957 "Deinococcota"
ASV2009 "WS4"
ASV3031 "Zixibacteria"
ASV3283 "Latescibacterota"
ASV3379 "Methylomirabilota"
ASV3682 "Sva0485"
ASV4215 "Fibrobacterota"
ASV4693 "Nanoarchaeota"
ASV4976 "Caldisericota"
ASV5020 "FCPU426"
ASV5375 "Hydrogenedentes"
ASV5601 "Crenarchaeota"
ASV5890 "Abditibacteriota"
ASV8621 "Deferrisomatota"
ASV8692 "WS1"
ASV9982 "Elusimicrobiota" ASV10597 "Firestonebacteria"
ASV11903 "WS2"
ASV12037 "Margulisbacteria"
ASV15732 "Micrarchaeota"
From this I presume, my tax_glom should produce a phyloseq object with 49 taxa (one per phyla) but after a long processing time it returns:
ps_16S_phylum <- tax_glom(ps_16S_Ave, "Phylum") ps_16S_phylum
phyloseq-class experiment-level object otu_table() OTU Table: [ 15250 taxa and 32 samples ] sample_data() Sample Data: [ 32 samples by 21 sample variables ] tax_table() Taxonomy Table: [ 15250 taxa by 8 taxonomic ranks ]
Any ideas why it would not be agglomerating when it can recognise the phyla in my phyloseq object?
Many thanks in advance! Paddy