joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

estimate_richness fails -- Component taxa/OTU names do not match. #1708

Open r-mashoodh opened 11 months ago

r-mashoodh commented 11 months ago

I know this issue has come up a bunch (e.g., here: https://github.com/joey711/phyloseq/issues/1018 and https://github.com/joey711/phyloseq/issues/552 - but I noticed its closed but with many people asking the same question, so I thought I'd ask again).

This code worked a couple years ago, but now I'm getting the following error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'otu_table': invalid class “phyloseq” object: 
 Component taxa/OTU names do not match.
 Taxa indices are critical to analysis.
 Try taxa_names()

(code provided below; ordination methods work fine)

I've implemented several of the changes mentioned across the posts with a similar error message but I cannot seem to plot or estimate richness.

I have update all packages. I have used subset_samples instead of prune_samples. I've used as.integer ... I would appreciate any advice.

I have saved the phyloseq objects ps and ps1 so that they could be inspected.

cheers. and thanks in advance!

# Read in OTU table
otu_table_in <- read.table("QIIME2_preprocessing//biom/otu_table_new.txt", sep = "\t", row.names = 1, header = TRUE)
otu_table_in <- as.matrix(otu_table_in)

# Read in taxonomy
# Separated by kingdom, phylum, class, order, family, genus, species
taxonomy <- read.table("QIIME2_preprocessing//biom/taxonomy.tsv", header=T, sep='\t')
taxonomy<- separate(taxonomy, Taxon, c("Kingdom","Phylum","Class","Order", "Family", "Genus", "Species"), sep= ";", remove=TRUE)

## make sure to order things so that tax_names match for both OTU table and taxonomy table ##
taxonomy <- taxonomy[match(rownames(otu_table_in), taxonomy$Feature.ID), ]
rownames(taxonomy) <- taxonomy$Feature.ID
taxonomy <- taxonomy %>% select(-Feature.ID)
taxonomy <- as.matrix(taxonomy)

## check rownames
all(rownames(otu_table_in) == rownames(taxonomy))
#TRUE

# Read in metadata
metadata <- read.table("QIIME2_preprocessing//biom/metadata.txt", row.names = 1, header=T)
# Create a variable to separate experiments
metadata$experiment <- ifelse(is.na(metadata$FOOD_2E), 0, 1) #1 denotes samples that are in expt: FOOD_2E

# Read in tree
phy_tree <- read_tree("QIIME2_preprocessing//biom/tree.nwk")

# Import all as phyloseq objects
OTU <- otu_table(otu_table_in, taxa_are_rows = TRUE)
TAX <- tax_table(taxonomy)
META <- sample_data(metadata)

# Sanity checks for consistent OTU names
head(taxa_names(TAX), 4)
head(taxa_names(OTU), 4)

# Same sample names
sample_names(OTU)
sample_names(META)

sample_names(OTU) <- sample_names(META) # double check this has worked

# Finally merge
ps <- phyloseq(OTU, TAX, META, phy_tree)

keep_samples <- rownames(subset(metadata, metadata$experiment==1))
ps1 <- prune_samples(sample_names(ps) %in% keep_samples, ps)
ps1 = filter_taxa(ps1, function(x) sum(x > 5) > (0.05 * length(x)), TRUE) 
ps1 = transform_sample_counts(ps1, function(x) x / sum(x))

## Estimate richness 
ps1.rich <-estimate_richness(transform_sample_counts(ps1, as.integer), 
                             measures=c("Chao1", "Shannon"))

gets me the following error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'otu_table': invalid class “phyloseq” object: 
 Component taxa/OTU names do not match.
 Taxa indices are critical to analysis.
 Try taxa_names()

I have double checked that taxa and OTU names match:

> all(rownames(otu_table_in) == rownames(taxonomy))
[1] TRUE
> # Sanity checks for consistent OTU names
> head(taxa_names(TAX), 4)
[1] "4612c0d0098e033627c181ad74b0ac38" "06e2fddd55a126874c77aa6ebacb97f8"
[3] "ffbdce3deba8ab3b7dca9b31ad06dc21" "860fbf7bf96d9afcbe5eebf96800389d"
> head(taxa_names(OTU), 4)
[1] "4612c0d0098e033627c181ad74b0ac38" "06e2fddd55a126874c77aa6ebacb97f8"
[3] "ffbdce3deba8ab3b7dca9b31ad06dc21" "860fbf7bf96d9afcbe5eebf96800389d"
> ps1
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 517 taxa and 71 samples ]
sample_data() Sample Data:       [ 71 samples by 11 sample variables ]
tax_table()   Taxonomy Table:    [ 517 taxa by 8 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 517 tips and 514 internal nodes ]

> setdiff(taxa_names(otu_table(ps1)), taxa_names(tax_table(ps1)))
character(0)
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vegan_2.6-4     lattice_0.21-8  permute_0.9-7   knitr_1.43      lubridate_1.9.2
 [6] forcats_1.0.0   stringr_1.5.0   dplyr_1.1.2     purrr_1.0.1     readr_2.1.4    
[11] tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0 phyloseq_1.38.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-162                bitops_1.0-7                matrixStats_0.63.0         
 [4] bit64_4.0.5                 RColorBrewer_1.1-3          httr_1.4.6                 
 [7] GenomeInfoDb_1.30.1         tools_4.1.2                 utf8_1.2.3                 
[10] R6_2.5.1                    DBI_1.1.3                   BiocGenerics_0.40.0        
[13] mgcv_1.8-42                 colorspace_2.1-0            rhdf5filters_1.6.0         
[16] ade4_1.7-22                 withr_2.5.0                 tidyselect_1.2.0           
[19] DESeq2_1.34.0               bit_4.0.5                   compiler_4.1.2             
[22] cli_3.6.1                   Biobase_2.54.0              DelayedArray_0.20.0        
[25] scales_1.2.1                genefilter_1.76.0           digest_0.6.31              
[28] rmarkdown_2.21              XVector_0.34.0              pkgconfig_2.0.3            
[31] htmltools_0.5.5             MatrixGenerics_1.6.0        highr_0.10                 
[34] fastmap_1.1.1               rlang_1.1.1                 rstudioapi_0.14            
[37] RSQLite_2.3.1               generics_0.1.3              jsonlite_1.8.4             
[40] BiocParallel_1.28.3         RCurl_1.98-1.12             magrittr_2.0.3             
[43] GenomeInfoDbData_1.2.7      biomformat_1.22.0           Matrix_1.5-1               
[46] Rcpp_1.0.10                 munsell_0.5.0               S4Vectors_0.32.4           
[49] Rhdf5lib_1.16.0             fansi_1.0.4                 ape_5.7-1                  
[52] lifecycle_1.0.3             stringi_1.7.12              yaml_2.3.7                 
[55] MASS_7.3-60                 SummarizedExperiment_1.24.0 zlibbioc_1.40.0            
[58] rhdf5_2.38.1                plyr_1.8.8                  grid_4.1.2                 
[61] blob_1.2.4                  parallel_4.1.2              crayon_1.5.2               
[64] Biostrings_2.62.0           splines_4.1.2               multtest_2.50.0            
[67] annotate_1.72.0             hms_1.1.3                   KEGGREST_1.34.0            
[70] locfit_1.5-9.7              pillar_1.9.0                igraph_1.4.2               
[73] GenomicRanges_1.46.1        geneplotter_1.72.0          reshape2_1.4.4             
[76] codetools_0.2-19            stats4_4.1.2                XML_3.99-0.14              
[79] glue_1.6.2                  evaluate_0.21               data.table_1.14.8          
[82] tzdb_0.4.0                  vctrs_0.6.2                 png_0.1-8                  
[85] foreach_1.5.2               gtable_0.3.3                cachem_1.0.8               
[88] xfun_0.39                   xtable_1.8-4                survival_3.5-5             
[91] iterators_1.0.14            AnnotationDbi_1.56.2        memoise_2.0.1              
[94] IRanges_2.28.0              cluster_2.1.4               timechange_0.2.0    

/

r-mashoodh commented 11 months ago

I guess the issue might be normalisation here -- where as.integer is failing.

Is it best to stick to raw counts for alpha diversity measures? Would the same be true for beta diversity (ordination methods)?

Nwilliams96 commented 10 months ago

Hey, so I have had the same issue. Try running just shannon and it should work. I also have normalised data and shannon and simpon work, but Chao1 does not.

I hope this helps somewhat.