ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

Cross package error with metacoder/taxa. Filter_taxa throws taxon IDs do not exist. #122

Closed grabear closed 6 years ago

grabear commented 6 years ago

Hello @zachary-foster I'm having an issue with metacoder's parse_phyloseq function.

When using a phyloseq object that contains a data from a phylogenetic tree, the parse_phyloseq function returns the following:

...
# Generate a phyloseq object w/ a phylo-tree
> master_biom
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1955 taxa and 48 samples ]
sample_data() Sample Data:       [ 48 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 1955 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 1955 tips and 1953 internal nodes ]

# Fail to generate a metacoder object
> metacoder::parse_phyloseq(master_biom)
 Error: The following taxon IDs do not exist:
 edge, edge.length, Nnode, node.label, tip.label 
10.  stop(call. = FALSE, paste0("The following taxon IDs do not exist:\n", 
    limited_print(invalid_ids, type = "silent"))) 
9.  FUN(X[[i]], ...) 
8.  lapply(selection[is_char], function(x) {
    result <- match(x, self$taxon_ids())
    invalid_ids <- x[is.na(result)]
    if (length(invalid_ids) > 0) { ... 
7.  private$parse_nse_taxon_subset(subset) 
6.  self$supertaxa(subset = unique(data_taxon_ids[to_reassign]), 
    recursive = TRUE, simplify = FALSE, include_input = FALSE, 
    value = "taxon_indexes", na = FALSE) 
5.  FUN(X[[i]], ...) 
4.  lapply(seq_along(self$data)[reassign_obs], process_one) 
3.  output$filter_taxa(output$taxon_names() != "NA") 
2.  parse_phyloseq(phyloseq_object) at global.R#106
1.  get_metacoder_obj(master_biom) 

When using a phyloseq object without a phylogenetic tree my code runs fine:

...
# Generate a phyloseq object w/o a phlo-tree
> master_biom
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1955 taxa and 48 samples ]
sample_data() Sample Data:       [ 48 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 1955 taxa by 7 taxonomic ranks ]
# Generate a meatacoder object
> master_metacoder <- get_metacoder_obj(master_biom)
Summing per-taxon counts from 48 columns for 435 taxa
There were 50 or more warnings (use warnings() to see the first 50)
> master_metacoder
<Taxmap>
  435 taxa: ab. Bacteria, ac. Archaea, ad. Unassigned ... ug. uncultured bacterium, uh. Ambiguous_taxa
  435 edges: NA->ab, NA->ac, NA->ad, ab->ae, ab->af, ab->ag ... ld->ty, gs->tz, lf->ub, kc->uc, lh->ug, hj->uh
  5 data sets:
    otu_table:
      # A tibble: 1,955 x 49
        taxon~ Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample~ Sample~ Sample~ Sample~ Sample~ Sample~ Sample~ Sample~
        <chr>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
      1 li      4.38e-5  7.07e-5 0         5.14e-5  1.07e-5 7.84e-5 4.81e-4 1.10e-4 6.62e-5 3.28e-4 1.14e-4 3.21e-4 2.12e-4
      2 lj      1.07e-3  1.06e-3 0.000250  2.07e-3  2.56e-4 1.91e-4 8.86e-4 1.23e-3 1.36e-3 1.63e-3 6.51e-4 8.34e-4 1.05e-3
      3 lj      7.01e-4  7.54e-4 0.00120   5.65e-4  4.14e-3 2.02e-4 1.22e-3 7.99e-3 8.45e-4 1.42e-3 3.10e-4 7.82e-4 1.56e-3
      # ... with 1,952 more rows, and 35 more variables: Sample_26 <dbl>, Sample_30 <dbl>, Sample_50 <dbl>, Sample_59
      #   <dbl>, Sample_60 <dbl>, Sample_61 <dbl>, Sample_7 <dbl>, Sample_27 <dbl>, Sample_45 <dbl>, Sample_5 <dbl>,
      #   Sample_57 <dbl>, Sample_20 <dbl>, Sample_29 <dbl>, Sample_11 <dbl>, Sample_12 <dbl>, Sample_15 <dbl>, Sample_16
      #   <dbl>, Sample_18 <dbl>, Sample_19 <dbl>, Sample_23 <dbl>, Sample_28 <dbl>, Sample_31 <dbl>, Sample_35 <dbl>,
      #   Sample_40 <dbl>, Sample_41 <dbl>, Sample_46 <dbl>, Sample_47 <dbl>, Sample_51 <dbl>, Sample_55 <dbl>, Sample_56
      #   <dbl>, Sample_58 <dbl>, Sample_8 <dbl>, Sample_4 <dbl>, Sample_52 <dbl>, Sample_36 <dbl>
    tax_data:
      # A tibble: 1,955 x 8
        taxon_id Kingdom  Phylum        Class       Order         Family          Genus        Species             
        <chr>    <chr>    <chr>         <chr>       <chr>         <chr>           <chr>        <chr>               
      1 li       Bacteria Firmicutes    Clostridia  Clostridiales Lachnospiraceae Roseburia    uncultured organism 
      2 lj       Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae  Prevotella 9 uncultured bacterium
      3 lj       Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae  Prevotella 9 uncultured bacterium
      # ... with 1,952 more rows
    sam_data:
      # A tibble: 48 x 9
        sample_ids X.SampleID BarcodeSequence LinkerPrimerSequence ForwardFastqFile           ReverseF~ Treatm~ Samp~ Desc~
        <chr>      <chr>      <chr>           <chr>                <chr>                      <chr>     <chr>   <chr> <chr>
      1 Sample_1   Sample_1   <NA>            <NA>                 33749_S1_L001_R1_001.fastq 33749_S1~ Stress~ 33749 Fecal
      2 Sample_2   Sample_2   <NA>            <NA>                 33739_S2_L001_R1_001.fastq 33739_S2~ Stress~ 33739 Fecal
      3 Sample_3   Sample_3   <NA>            <NA>                 33737_S3_L001_R1_001.fastq 33737_S3~ Stress~ 33737 Fecal
      # ... with 45 more rows
    tax_table:
      # A tibble: 435 x 49
        taxon~ Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample~ Sample~ Sample~ Sample~ Sample~ Sample~ Sample~ Sample~
      * <chr>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
      1 ab      9.87e-1  0.994    0.982    0.991    9.83e-1 9.88e-1 9.86e-1 0.991   9.88e-1 9.90e-1 9.92e-1 9.87e-1 9.88e-1
      2 ac      7.89e-5  0        0.00108  0        1.07e-5 1.34e-4 5.07e-5 0       4.41e-5 1.01e-4 2.48e-4 1.54e-4 3.85e-5
      3 ad      1.27e-2  0.00602  0.0172   0.00920  1.68e-2 1.16e-2 1.39e-2 0.00921 1.19e-2 1.00e-2 7.44e-3 1.27e-2 1.25e-2
      # ... with 432 more rows, and 35 more variables: Sample_26 <dbl>, Sample_30 <dbl>, Sample_50 <dbl>, Sample_59 <dbl>,
      #   Sample_60 <dbl>, Sample_61 <dbl>, Sample_7 <dbl>, Sample_27 <dbl>, Sample_45 <dbl>, Sample_5 <dbl>, Sample_57
      #   <dbl>, Sample_20 <dbl>, Sample_29 <dbl>, Sample_11 <dbl>, Sample_12 <dbl>, Sample_15 <dbl>, Sample_16 <dbl>,
      #   Sample_18 <dbl>, Sample_19 <dbl>, Sample_23 <dbl>, Sample_28 <dbl>, Sample_31 <dbl>, Sample_35 <dbl>, Sample_40
      #   <dbl>, Sample_41 <dbl>, Sample_46 <dbl>, Sample_47 <dbl>, Sample_51 <dbl>, Sample_55 <dbl>, Sample_56 <dbl>,
      #   Sample_58 <dbl>, Sample_8 <dbl>, Sample_4 <dbl>, Sample_52 <dbl>, Sample_36 <dbl>
    diff_table:
      # A tibble: 435 x 11
        taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value harti~ harti~ bimod~ bimod~
        <chr>    <chr>       <chr>                   <dbl>       <dbl>     <dbl>          <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
      1 ab       Stressed    Control               0.00272    0.00186   0.00106          0.313   0.940  0.654  0.645  0.629
      2 ac       Stressed    Control               3.48       0.000101  0.000744         0.0239  0.993  0.599  0.879  0.925
      3 ad       Stressed    Control              -0.200     -0.00185  -0.00180          0.195   0.863  0.510  0.559  0.650
      # ... with 432 more rows
  0 functions:

The error exist at line 86 in parse.R that I linked to at the beginning:

 # Remove NA taxa
  output$filter_taxa(output$taxon_names() != "NA")

This error is generated by the filter_taxa function from the taxa package. I'm aware there has been some development recently, with this function. It is noteworthy that I have preformed the same analysis with the same data various times over the past 2 to 3 weeks. Every time I used a phylogenetic tree in my phyloseq object, and it worked perfectly.

Session info ------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Chicago             
 date     2018-01-22                  

Packages ----------------------------------------------------------------------------------------------------------------------
 package      * version    date       source                                
 ade4           1.7-10     2017-12-15 CRAN (R 3.4.3)                        
 ape            5.0        2017-10-30 CRAN (R 3.4.2)                        
 assertthat     0.2.0      2017-04-11 CRAN (R 3.4.1)                        
 base         * 3.4.3      2017-12-06 local                                 
 bindr          0.1        2016-11-13 CRAN (R 3.4.1)                        
 bindrcpp       0.2        2017-06-17 CRAN (R 3.4.1)                        
 Biobase        2.38.0     2017-10-31 Bioconductor                          
 BiocGenerics   0.24.0     2017-10-31 Bioconductor                          
 biomformat     1.6.0      2017-10-31 Bioconductor                          
 Biostrings     2.46.0     2017-10-31 Bioconductor                          
 cli            1.0.0      2017-11-05 CRAN (R 3.4.2)                        
 cluster        2.0.6      2017-03-10 CRAN (R 3.4.3)                        
 codetools      0.2-15     2016-10-05 CRAN (R 3.4.3)                        
 colorspace     1.3-2      2016-12-14 CRAN (R 3.4.2)                        
 compiler       3.4.3      2017-12-06 local                                 
 crayon         1.3.4      2017-09-16 CRAN (R 3.4.2)                        
 data.table     1.10.4-3   2017-10-27 CRAN (R 3.4.2)                        
 datasets     * 3.4.3      2017-12-06 local                                 
 devtools       1.13.4     2017-11-09 CRAN (R 3.4.2)                        
 digest         0.6.13     2017-12-14 CRAN (R 3.4.1)                        
 diptest      * 0.75-7     2016-12-05 CRAN (R 3.4.1)                        
 dplyr        * 0.7.4      2017-09-28 CRAN (R 3.4.2)                        
 foreach        1.4.4      2017-12-12 CRAN (R 3.4.3)                        
 ggplot2      * 2.2.1      2016-12-30 CRAN (R 3.4.2)                        
 ggtree       * 1.10.2     2018-01-03 Bioconductor                          
 glue           1.2.0      2017-10-29 CRAN (R 3.4.2)                        
 graphics     * 3.4.3      2017-12-06 local                                 
 grDevices    * 3.4.3      2017-12-06 local                                 
 grid           3.4.3      2017-12-06 local                                 
 gtable         0.2.0      2016-02-26 CRAN (R 3.4.2)                        
 igraph         1.1.2      2017-07-21 CRAN (R 3.4.2)                        
 IRanges        2.12.0     2017-10-31 Bioconductor                          
 iterators      1.0.9      2017-12-12 CRAN (R 3.4.3)                        
 jsonlite       1.5        2017-06-01 CRAN (R 3.4.1)                        
 lattice        0.20-35    2017-03-25 CRAN (R 3.4.3)                        
 lazyeval       0.2.1      2017-10-29 CRAN (R 3.4.2)                        
 magrittr       1.5        2014-11-22 CRAN (R 3.4.1)                        
 MASS           7.3-47     2017-02-26 CRAN (R 3.4.3)                        
 Matrix         1.2-12     2017-11-20 CRAN (R 3.4.3)                        
 memoise        1.1.0      2017-04-21 CRAN (R 3.4.1)                        
 metacoder    * 0.2.0.9001 2018-01-19 Github (grunwaldlab/metacoder@ecd67db)
 methods      * 3.4.3      2017-12-06 local                                 
 mgcv           1.8-22     2017-09-24 CRAN (R 3.4.3)                        
 modes        * 0.7.0      2016-03-07 CRAN (R 3.4.1)                        
 multtest       2.34.0     2017-10-31 Bioconductor                          
 munsell        0.4.3      2016-02-13 CRAN (R 3.4.2)                        
 nlme           3.1-131    2017-02-06 CRAN (R 3.4.3)                        
 parallel       3.4.3      2017-12-06 local                                 
 permute        0.9-4      2016-09-09 CRAN (R 3.4.2)                        
 phyloseq     * 1.22.3     2017-11-06 Bioconductor                          
 pillar         1.0.1      2017-11-27 CRAN (R 3.4.3)                        
 pkgconfig      2.0.1      2017-03-21 CRAN (R 3.4.1)                        
 plyr           1.8.4      2016-06-08 CRAN (R 3.4.1)                        
 purrr          0.2.4      2017-10-18 CRAN (R 3.4.2)                        
 R6             2.2.2      2017-06-17 CRAN (R 3.4.1)                        
 Rcpp           0.12.14    2017-11-23 CRAN (R 3.4.2)                        
 reshape2       1.4.3      2017-12-11 CRAN (R 3.4.3)                        
 rhdf5          2.22.0     2017-10-31 Bioconductor                          
 rlang          0.1.6      2017-12-21 CRAN (R 3.4.3)                        
 rvcheck        0.0.9      2017-07-10 CRAN (R 3.4.1)                        
 S4Vectors      0.16.0     2017-10-31 Bioconductor                          
 scales         0.5.0      2017-08-24 CRAN (R 3.4.2)                        
 splines        3.4.3      2017-12-06 local                                 
 stats        * 3.4.3      2017-12-06 local                                 
 stats4         3.4.3      2017-12-06 local                                 
 stringi        1.1.6      2017-11-17 CRAN (R 3.4.2)                        
 stringr        1.2.0      2017-02-18 CRAN (R 3.4.1)                        
 survival       2.41-3     2017-04-04 CRAN (R 3.4.3)                        
 taxa         * 0.2.0.9104 2018-01-19 Github (ropensci/taxa@652b060)        
 tibble         1.4.1      2017-12-25 CRAN (R 3.4.3)                        
 tidyr          0.7.2      2017-10-16 CRAN (R 3.4.2)                        
 tools          3.4.3      2017-12-06 local                                 
 treeio       * 1.2.1      2017-11-02 Bioconductor                          
 utf8           1.1.3      2018-01-03 CRAN (R 3.4.3)                        
 utils        * 3.4.3      2017-12-06 local                                 
 vegan          2.4-5      2017-12-01 CRAN (R 3.4.3)                        
 withr          2.1.1      2017-12-19 CRAN (R 3.4.3)                        
 XVector        0.18.0     2017-10-31 Bioconductor                          
 yaml           2.1.16     2017-12-12 CRAN (R 3.4.3)                        
 zlibbioc       1.24.0     2017-10-31 Bioconductor  
grabear commented 6 years ago

Not a huge issue as you can just remove the phylogenetic tree.

zachary-foster commented 6 years ago

Ahh yes, I added some taxon ID validation but it assumed that all datasets would be classified by taxon ID (which is not true). I think I fixed it. You will likely see some warning messages now when filtering. In most cases, there is no advantage to having data not classified by taxon ID in a taxmap object, but the sample and phylo data is in the output of parse_phyloseq because thats the best place I can think to put it.

Reinstall and let me know how it goes.

zachary-foster commented 6 years ago

Hello @grabear, has this been fixed? Can we close? Thanks

grabear commented 6 years ago

Yes! It's working perfectly, now. Thanks for the help with this one. @zachary-foster

zachary-foster commented 6 years ago

Cool, no problem!