mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
164 stars 54 forks source link

enhancement request for `tune`: t.test.process choice ignores bounded nature of performance criteria #101

Open aljabadi opened 4 years ago

aljabadi commented 4 years ago

many times users have complained that the chosen keepX is too strict. Here's an example for tune.spca for component 2:

suppressMessages(library(mixOmics))
data(multidrug)
set.seed(2341)
tune.spca_res1 <- tune.spca(ncomp = 2, X = multidrug$ABC.trans, nrepeat = 5, folds = 3, test.keepX = seq(5,15,5))
#> KeepX =  5 
#> KeepX =  10 
#> KeepX =  15 
#> KeepX =  5 
#> KeepX =  10 
#> KeepX =  15
plot(tune.spca_res1) 

Created on 2020-10-02 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.2 (2020-06-22) #> os macOS Catalina 10.15 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2020-10-02 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2) #> BiocParallel 1.23.2 2020-07-06 [1] Bioconductor #> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) #> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0) #> corpcor 1.6.9 2017-04-01 [1] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) #> dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.2) #> ellipse 0.4.2 2020-05-27 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0) #> ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2) #> ggrepel 0.8.2 2020-03-08 [1] CRAN (R 4.0.0) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> igraph 1.2.5 2020-03-19 [1] CRAN (R 4.0.0) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) #> labeling 0.3 2014-08-23 [1] CRAN (R 4.0.0) #> lattice * 0.20-41 2020-04-02 [1] CRAN (R 4.0.2) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2) #> MASS * 7.3-53 2020-09-09 [1] CRAN (R 4.0.2) #> Matrix 1.2-18 2019-11-27 [1] CRAN (R 4.0.2) #> matrixStats 0.57.0 2020-09-25 [1] CRAN (R 4.0.2) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> mime 0.9 2020-02-04 [1] CRAN (R 4.0.0) #> mixOmics * 6.13.67 2017-02-06 [1] Bioconductor (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) #> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2) #> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2) #> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) #> rARPACK 0.11-0 2016-03-10 [1] CRAN (R 4.0.0) #> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.0) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.0) #> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2) #> rmarkdown 2.4 2020-09-30 [1] CRAN (R 4.0.2) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0) #> RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.0.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0) #> tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2) #> tidyr 1.1.2 2020-08-27 [1] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2) #> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) #> xfun 0.18 2020-09-29 [1] CRAN (R 4.0.2) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```

Initially, we thought it's a threshold issue, but that's not always the case as we get close to 100%.

possible culprit

We measure the significance of the improvement in error rate or correlation to decide the optimality. It ignores the bounded (truncated?) nature of the performance criteria (e.g. correlation is b/w [-1,1]).

possible solution

Adjust the measure based on a truncated distribution, (or transform current measures to take them into an unbounded space?)

Max-Bladen commented 2 years ago

Hi @aljabadi

Few questions about this one: