mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
153 stars 51 forks source link

LOOCV validation tune.block.splsda error #213

Closed VilenneFrederique closed 2 years ago

VilenneFrederique commented 2 years ago

🐞 Describe the bug: Dear all,

I've been working on a research project consisting out of a microbiome data set and several metabolomics data sets. In this data set, I only have 31 samples. I wanted to use DIABLO to find relationships between the ASVs in the microbiome data and the metabolites. I am using the DIABLO TCGA as a guideline. Since I only have so few samples and limited computational power, I opted to use LOOCV instead of the Mfold option. This worked out fine during the initial tuning of the principal components. However, during the tune.block.splsda I encounter a bug. It always gives the following error: Error in 1:n : NA/NaN argument To make sure this isn't a problem with my own data sets, I retryed it using the TCGA data set. Which encounters the same problem. The entire code is given below in the reproducible examples using the TCGA data set.


πŸ” reprex results from reproducible example including sessioninfo():

The code used: library(mixOmics) data("breast.TCGA") data = list(miRNA = breast.TCGA$data.train$mirna, mRNA = breast.TCGA$data.train$mrna, proteomics = breast.TCGA$data.train$protein) Y = breast.TCGA$data.train$subtype design = matrix(0.1, ncol = length(data), nrow = length(data), dimnames = list(names(data), names(data))) diag(design) = 0 # set diagonal to 0s # set grid of values for each component to test test.keepX = list (mRNA = c(5:9, seq(10, 18, 2), seq(20,30,5)), miRNA = c(5:9, seq(10, 18, 2), seq(20,30,5)), proteomics = c(5:9, seq(10, 18, 2), seq(20,30,5))) # run the feature selection tuning tune.TCGA = tune.block.splsda(X = data, Y = Y, ncomp = 2, test.keepX = test.keepX, design = design, validation = "loo")

Output: Design matrix has changed to include Y; each block will be linked to Y.

You have provided a sequence of keepX of length: 13 for block mRNA and 13 for block miRNA and 13 for block proteomics. This results in 2197 models being fitted for each component and each nrepeat, this may take some time to run, be patient!

You can look into the 'BPPARAM' argument to speed up computation time. Error in 1:n : NA/NaN argument

SessionInfo: R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: LC_COLLATE=Dutch_Belgium.utf8
LC_CTYPE=Dutch_Belgium.utf8
LC_MONETARY=Dutch_Belgium.utf8 LC_NUMERIC=C
LC_TIME=Dutch_Belgium.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] mixOmics_6.20.0 ggplot2_3.3.5 lattice_0.20-45 MASS_7.3-57

loaded via a namespace (and not attached): [1] Rcpp_1.0.8.3 RSpectra_0.16-1 plyr_1.8.7 pillar_1.7.0 compiler_4.2.0 RColorBrewer_1.1-3 [7] tools_4.2.0 lifecycle_1.0.1 tibble_3.1.6 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2
[13] Matrix_1.4-1 igraph_1.3.1 DBI_1.1.2 cli_3.3.0 ggrepel_0.9.1 parallel_4.2.0
[19] gridExtra_2.3 stringr_1.4.0 withr_2.5.0 dplyr_1.0.9 generics_0.1.2 vctrs_0.4.1
[25] grid_4.2.0 tidyselect_1.1.2 glue_1.6.2 ellipse_0.4.2 R6_2.5.1 fansi_1.0.3
[31] rARPACK_0.11-0 BiocParallel_1.30.0 tidyr_1.2.0 reshape2_1.4.4 purrr_0.3.4 corpcor_1.6.10
[37] magrittr_2.0.3 scales_1.2.0 ellipsis_0.3.2 matrixStats_0.62.0 assertthat_0.2.1 colorspace_2.0-3
[43] utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 crayon_1.5.1
`

πŸ€” Expected behavior:

One would expect LOOCV to start and happen for the tuning of the features.


πŸ’‘ Possible solution:

None, sorry :-)

VilenneFrederique commented 2 years ago

After digging into the code, the following has fixed it:

MCV.block.splsda.R Line: 130 Add after line 130: n = nrow(X[[1]]) repeated.measure = 1:n

This seems to work for me. Please confirm if it does! Thanks in advance!

Max-Bladen commented 2 years ago

Thanks so much @VilenneFrederique for the comprehensive post. This has been fixed and is currently live on branch issue-213. Once we have gone through all the relevant tests, this will be merged and available in the BiocConductor build