microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
45 stars 25 forks source link

Issue 334 estimate diversity #514

Closed Insaynoah closed 2 months ago

Insaynoah commented 2 months ago

This should do the trick for #334. I've made it so that, if there's no tree and someone tries to calculate a faith index, the function will not calculate it and throws a warning instead.

Insaynoah commented 2 months ago

@antagomir I removed the trailing whitespaces in all the if statements.

antagomir commented 2 months ago

awesome!

antagomir commented 2 months ago

Hmm there are still some failing checks?

Insaynoah commented 2 months ago

I forgot to add the boolean initialisation. The checks have passed now

antagomir commented 2 months ago

perfect.

TuomasBorman commented 2 months ago

I was just giving my feedback. This PR contained some problematic stuff

TuomasBorman commented 2 months ago

Also this was already fixed with commit https://github.com/microbiome/mia/commit/23cffa4b2183e4de5ff414eae7bef51147615c4d if I'm not mistaken. (It was not linked to issue; we should focus more on closing issues if they are already fixed)

> library(mia)
> data(GlobalPatterns)
> tse <- GlobalPatterns
> rowTree(tse) <- NULL
> # All index names as known by the function
> index <- c("shannon","gini_simpson","inverse_simpson", "coverage", "fisher",
+            "faith",  "log_modulo_skewness")
> # Calculate diversities
> tse <- estimateDiversity(tse, index = index)
Warning message:
Faith diversity has been excluded from the results since it cannot be calculated without rowTree. This requires a rowTree in the input argument x. Make sure that 'rowTree(x)' is not empty, or make sure to specify 'tree_name' in the input arguments. Warning is also provided if the tree does not have any branches. You can consider adding rowTree to include this index. 
> sessionInfo()
R Under development (unstable) (2024-01-12 r85803)
Platform: x86_64-pc-linux-gnu
Running under: Linux Mint 21

Matrix products: default
BLAS:   /opt/R/devel/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fi_FI.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Helsinki
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] miaViz_1.11.3                   ggraph_2.1.0                    ggplot2_3.5.0                   mia_1.11.9                     
 [5] MultiAssayExperiment_1.29.0     TreeSummarizedExperiment_2.11.0 Biostrings_2.71.5               XVector_0.43.1                 
 [9] SingleCellExperiment_1.25.0     SummarizedExperiment_1.33.2     Biobase_2.63.0                  GenomicRanges_1.55.1           
[13] GenomeInfoDb_1.39.5             IRanges_2.37.0                  S4Vectors_0.41.3                BiocGenerics_0.49.1            
[17] MatrixGenerics_1.15.0           matrixStats_1.2.0              

loaded via a namespace (and not attached):
  [1] rstudioapi_0.15.0           jsonlite_1.8.8              magrittr_2.0.3              ggbeeswarm_0.7.2            farver_2.1.1               
  [6] fs_1.6.3                    zlibbioc_1.49.0             vctrs_0.6.5                 memoise_2.0.1               DelayedMatrixStats_1.25.1  
 [11] RCurl_1.98-1.14             ggtree_3.11.1               htmltools_0.5.7             S4Arrays_1.3.1              usethis_2.2.2              
 [16] curl_5.2.0                  BiocNeighbors_1.21.2        gridGraphics_0.5-1          SparseArray_1.3.2           htmlwidgets_1.6.4          
 [21] desc_1.4.3                  testthat_3.2.1              plyr_1.8.9                  DECIPHER_2.31.0             cachem_1.0.8               
 [26] commonmark_1.9.0            igraph_2.0.3                mime_0.12                   lifecycle_1.0.4             pkgconfig_2.0.3            
 [31] rsvd_1.0.5                  Matrix_1.6-5                R6_2.5.1                    fastmap_1.1.1               rcmdcheck_1.4.0            
 [36] GenomeInfoDbData_1.2.11     shiny_1.8.0                 digest_0.6.34               aplot_0.2.2                 ggnewscale_0.4.9           
 [41] colorspace_2.1-0            ps_1.7.5                    patchwork_1.2.0             rprojroot_2.0.4             scater_1.31.2              
 [46] irlba_2.3.5.1               pkgload_1.3.3               RSQLite_2.3.4               vegan_2.6-5                 beachmat_2.19.0            
 [51] fansi_1.0.6                 polyclip_1.10-6             abind_1.4-5                 mgcv_1.9-1                  compiler_4.4.0             
 [56] remotes_2.4.2.1             bit64_4.0.5                 withr_2.5.2                 BiocParallel_1.37.1         viridis_0.6.4              
 [61] DBI_1.2.1                   pkgbuild_1.4.3              ggforce_0.4.1               MASS_7.3-60.2               DelayedArray_0.29.0        
 [66] sessioninfo_1.2.2           bluster_1.13.0              permute_0.9-7               tools_4.4.0                 vipor_0.4.7                
 [71] beeswarm_0.4.0              ape_5.7-1                   httpuv_1.6.13               glue_1.7.0                  callr_3.7.3                
 [76] nlme_3.1-164                promises_1.2.1              grid_4.4.0                  cluster_2.1.6               reshape2_1.4.4             
 [81] generics_0.1.3              gtable_0.3.4                tidyr_1.3.0                 BiocSingular_1.19.0         tidygraph_1.3.0            
 [86] ScaledMatrix_1.11.0         xml2_1.3.6                  utf8_1.2.4                  ggrepel_0.9.5               pillar_1.9.0               
 [91] stringr_1.5.1               yulab.utils_0.1.3           later_1.3.2                 splines_4.4.0               tweenr_2.0.2               
 [96] dplyr_1.1.4                 treeio_1.27.0               lattice_0.22-5              bit_4.0.5                   tidyselect_1.2.0           
[101] DirichletMultinomial_1.45.0 miniUI_0.1.1.1              scuttle_1.13.0              knitr_1.46                  gridExtra_2.3              
[106] xfun_0.43                   graphlayouts_1.0.2          brio_1.1.4                  devtools_2.4.5              stringi_1.8.3              
[111] xopen_1.0.0                 ggfun_0.1.3                 lazyeval_0.2.2              codetools_0.2-19            tibble_3.2.1               
[116] BiocManager_1.30.22         ggplotify_0.1.2             cli_3.6.2                   xtable_1.8-4                processx_3.8.3             
[121] munsell_0.5.0               roxygen2_7.3.1              Rcpp_1.0.12                 parallel_4.4.0              ellipsis_0.3.2             
[126] blob_1.2.4                  prettyunits_1.2.0           profvis_0.3.8               urlchecker_1.0.1            sparseMatrixStats_1.15.0   
[131] bitops_1.0-7                decontam_1.23.0             viridisLite_0.4.2           tidytree_0.4.6              scales_1.3.0               
[136] purrr_1.0.2                 crayon_1.5.2                rlang_1.1.3      
antagomir commented 2 months ago

Thanks @TuomasBorman and sorry for messing it up. This seemed like a simple fix to me.

With reversion it should be OK and this was good practice anyway.

But now, is the original issue solved? If user has SE with no tree, and Faith is called, this should throw a warning instead or error. That must be fixed, if not already. It has caused constant issues in training events etc.

Otherwise, all good.

TuomasBorman commented 2 months ago

Yes, it was good practice and it did not cause any problems.

With TreeSE, the supported indices include faith. When there is no rowTree and indices contain faith, user gets warning.

> asd <- estimateDiversity(tse)
Warning message:
Faith diversity has been excluded from the results since it cannot be calculated without rowTree. This requires a rowTree in the input argument x. Make sure that 'rowTree(x)' is not empty, or make sure to specify 'tree_name' in the input arguments. Warning is also provided if the tree does not have any branches. You can consider adding rowTree to include this index. 

or

 > asd <- estimateDiversity(tse, index = index)
Warning message:
Faith diversity has been excluded from the results since it cannot be calculated without rowTree. This requires a rowTree in the input argument x. Make sure that 'rowTree(x)' is not empty, or make sure to specify 'tree_name' in the input arguments. Warning is also provided if the tree does not have any branches. You can consider adding rowTree to include this index. 

With SE, faith is not included in supported indices. In my opinion, user should get error when trying to calculate something that is not supported.

# No warnings or errors
> asd <- estimateDiversity(se)

However, when faith is inputted, this gives "wrong" error. It should say that "user tried to calculate non-supported index".

> asd <- estimateDiversity(se, index = index)
Error: 'name' must be a non-empty character value and have the same length than 'index'.

# This works
> asd <- estimateDiversity(se, index = index, name = index[-1])

This is because we use match.args function (it takes correct subset from user's input), however, name = index by default and it is not subsetted.

  1. So the problem described in the issue was already solved: Give warning when TreeSE does not contain rowTree.
  2. When SE/TreeSE contains unsupported indices, we could modify input check by replacing match.arg with
supported_index <- c(
    "coverage", "fisher", "gini_simpson", "inverse_simpson",
    "log_modulo_skewness", "shannon")
if( any(!index %in% supported_index) ){
    stop("'index' contains unsupported indices.", call. = FALSE)
}

To try that everything works as expected:


library(mia)

data(GlobalPatterns)
tse <- GlobalPatterns
rowTree(tse) <- NULL

se <- as(tse, "SummarizedExperiment")

# All index names as known by the function
index <- c("shannon","gini_simpson","inverse_simpson", "coverage", "fisher", "faith",  "log_modulo_skewness")
index2 <- c(index, "test")

# Should give warning since default set includes faith
res <- estimateDiversity(tse)
res <- estimateDiversity(tse, index = index)

# Should give no warning
res <- estimateDiversity(se)
# Should lead to informative error, since faith is not supported
res <- estimateDiversity(se, index = index)

# Should give informative wrror
res <- estimateDiversity(tse, index = index2)
res <- estimateDiversity(se, index = index2)