Closed aaronsaunders closed 11 years ago
Sorry for the delay. This sounds at first like a difficult problem to diagnose without a reproducible example with data. I originally started writing a response asking if you could make this, but in drafting an example I was able to reproduce the error.
library("phyloseq")
data("GlobalPatterns")
tax_glom(physeq=GlobalPatterns, taxrank=rank_names(GlobalPatterns)[1])
Error in apply(tax, 1, function(i) { : dim(X) must have a positive length
tax_glom(physeq=GlobalPatterns, taxrank="Kingdom")
Error in apply(tax, 1, function(i) { : dim(X) must have a positive length
On the other hand, if you're just hoping to create a table of sums based on taxonomic elements, you don't actually need to use tax_glom
. For example (continuing the code called above):
tapply(taxa_sums(GlobalPatterns), factor(tax_table(GlobalPatterns)[, "Kingdom"]), sum)
Archaea Bacteria
195598 28021080
First, does this solve your problem? Second, have you encountered this other than in the left-most rank? It looks as though this is a problem with R automatically converting the 1-column taxonomy table character matrix into a character vector, which would return NULL
when dim
is called internally by the apply
function. I will try to sniff this out. The solution above using tapply
for getting sums is much faster than tax_glom
, though, because it avoids carefully pruning the tree and other data management steps that you don't need if you just want the sums.
I will leave this issue open until I have sniffed out and squashed this bug. It does look like it only applies to tax_glom
for the left-most rank. Please anyone let me know if there are examples outside of this scope.
I working out a fix. I'll announce here when it is posted to the github-devel branch. The phyloseq version number will be 1.5.20
or greater.
Yep, this was fixed in the aforementioned commit:
04dc4336843b5e172ae0fe7cecd24b35437eefde
Hi Joey,
I have had a similar issue to the one mentioned above. I created my own taxonomic string with these functions:
split_species = function(string, n = 2) { splits = str_split(string, "/", n + 1) res = map_if(splits, ~length(.x) > 2, ~.x[1:n]) %>% map_chr(str_c, collapse = "/") return(res) } add_taxonomy_column = function(physeq, num_species = 2) { tax_df = as.data.frame(tax_table(physeq)) %>% rownames_to_column("OTU") %>% mutate(Species = split_species(Species, n = num_species)) %>% mutate(Taxonomy = case_when( is.na(Class) ~ str_c("p:", Phylum), is.na(Order) ~ str_c("c:", Class), is.na(Family) ~ str_c("o:", Order), is.na(Genus) ~ str_c("f:", Family), is.na(Species) ~ str_c("g:", Genus), TRUE ~ str_c(Genus, " ", Species) ) )
tax = as.matrix(tax_df[, -1]) rownames(tax) = tax_df$OTU tax_table(physeq) = tax_table(tax)
return(physeq) }
As seen here: https://rdrr.io/github/mworkentine/mattsUtils/src/R/microbiome_helpers.R#sym-add_taxonomy_column
If applied to the global patterns dataset this creates an output with 19216 taxa. If I use the tax_glom function on this new string "Taxonomy" the GlobalPatterns dataset gets agglomerated to 2306 taxa, however, there are actually only 2217 unique taxa in this dataset. i.e. there are 89 taxa not agglomerated (one example being f:Oceanospirillaceae).
Here is the example code:
data("GlobalPatterns") GP <- add_taxonomy_column(GlobalPatterns) ntaxa(GP) GP_taxonomy <- tax_glom(GP, "Taxonomy") ntaxa(GP_taxonomy) unique <- unique(GP@tax_table[,8])
It seems that the function does not want to agglomerate the taxa if they are indeed different species or one of the higher classifications is not the same for all, i.e. g:Clostridium.
In my own dataset the annotation is considerably worse than of the global patterns dataset and in some cases has the highest classification to be assigned to domain bacteria, however, if I have two taxa that are both assigned d:bacteria, these do not agglomerate when using tax_glom.
My intention here is to get as much information from the taxonomic assignments as possible, but not to have any duplicate assignments in my dataset.
I would very much appreciate your help.
Thank you, Anni
I'm getting a similar error with tip_glom
. Here's a reproducible example:
library(phyloseq)
data(enterotype)
# create random tree
symbiont_tree = ape::rtree(phyloseq::ntaxa(enterotype))
symbiont_tree$tip.label = phyloseq::taxa_names(enterotype)
# phyloseq object with tree
physeq = phyloseq::phyloseq(
phyloseq::otu_table(enterotype),
phyloseq::tax_table(enterotype),
phyloseq::sample_data(enterotype),
phyloseq::phy_tree(symbiont_tree)
)
# tip glom
phyloseq::tip_glom(physeq, h=2)
The error is:
Error in apply(taxmerge, 2, function(i) { :
dim(X) must have a positive length
My sessionInfo:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /opt/microsoft/ropen/3.4.3/lib64/R/lib/libRblas.so
LAPACK: /opt/microsoft/ropen/3.4.3/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] phyloseq_1.22.3 RevoUtilsMath_10.0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 compiler_3.4.3 pillar_1.2.3 plyr_1.8.4 XVector_0.18.0 iterators_1.0.9 tools_3.4.3 zlibbioc_1.24.0
[9] packrat_0.4.8-1 jsonlite_1.5 tibble_1.4.2 nlme_3.1-131 rhdf5_2.22.0 gtable_0.2.0 lattice_0.20-35 mgcv_1.8-22
[17] pkgconfig_2.0.1 rlang_0.2.1 Matrix_1.2-12 foreach_1.4.4 igraph_1.2.1 parallel_3.4.3 stringr_1.3.1 cluster_2.0.6
[25] Biostrings_2.46.0 RevoUtils_10.0.7 S4Vectors_0.16.0 IRanges_2.12.0 multtest_2.34.0 stats4_3.4.3 ade4_1.7-11 grid_3.4.3
[33] Biobase_2.38.0 data.table_1.11.4 survival_2.41-3 reshape2_1.4.3 ggplot2_2.2.1 magrittr_1.5 splines_3.4.3 scales_0.5.0
[41] codetools_0.2-15 MASS_7.3-47 BiocGenerics_0.24.0 biomformat_1.6.0 permute_0.9-4 ape_5.1 colorspace_1.3-2 stringi_1.2.2
[49] lazyeval_0.2.1 munsell_0.4.3 vegan_2.5-2
I am trying to count the assignments at each level and use tax_glom to summarise the tax_table. But tax_glom() with rank_names[1] throws an error.