Closed VilenneFrederique closed 2 years ago
After digging into the code, the following has fixed it:
MCV.block.splsda.R
Line: 130
Add after line 130:
n = nrow(X[[1]])
repeated.measure = 1:n
This seems to work for me. Please confirm if it does! Thanks in advance!
Thanks so much @VilenneFrederique for the comprehensive post. This has been fixed and is currently live on branch issue-213
. Once we have gone through all the relevant tests, this will be merged and available in the BiocConductor build
π Describe the bug: Dear all,
I've been working on a research project consisting out of a microbiome data set and several metabolomics data sets. In this data set, I only have 31 samples. I wanted to use DIABLO to find relationships between the ASVs in the microbiome data and the metabolites. I am using the DIABLO TCGA as a guideline. Since I only have so few samples and limited computational power, I opted to use LOOCV instead of the Mfold option. This worked out fine during the initial tuning of the principal components. However, during the tune.block.splsda I encounter a bug. It always gives the following error:
Error in 1:n : NA/NaN argument
To make sure this isn't a problem with my own data sets, I retryed it using the TCGA data set. Which encounters the same problem. The entire code is given below in the reproducible examples using the TCGA data set.π reprex results from reproducible example including sessioninfo():
The code used:
library(mixOmics)
data("breast.TCGA")
data = list(miRNA = breast.TCGA$data.train$mirna, mRNA = breast.TCGA$data.train$mrna, proteomics = breast.TCGA$data.train$protein)
Y = breast.TCGA$data.train$subtype
design = matrix(0.1, ncol = length(data), nrow = length(data), dimnames = list(names(data), names(data)))
diag(design) = 0 # set diagonal to 0s
# set grid of values for each component to test
test.keepX = list (mRNA = c(5:9, seq(10, 18, 2), seq(20,30,5)), miRNA = c(5:9, seq(10, 18, 2), seq(20,30,5)), proteomics = c(5:9, seq(10, 18, 2), seq(20,30,5)))
# run the feature selection tuning
tune.TCGA = tune.block.splsda(X = data, Y = Y, ncomp = 2, test.keepX = test.keepX, design = design, validation = "loo")
Output: Design matrix has changed to include Y; each block will be linked to Y.
You have provided a sequence of keepX of length: 13 for block mRNA and 13 for block miRNA and 13 for block proteomics. This results in 2197 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'BPPARAM' argument to speed up computation time. Error in 1:n : NA/NaN argument
SessionInfo:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
LC_COLLATE=Dutch_Belgium.utf8
LC_CTYPE=Dutch_Belgium.utf8
LC_MONETARY=Dutch_Belgium.utf8
LC_NUMERIC=C
LC_TIME=Dutch_Belgium.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mixOmics_6.20.0 ggplot2_3.3.5 lattice_0.20-45 MASS_7.3-57
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 RSpectra_0.16-1 plyr_1.8.7 pillar_1.7.0 compiler_4.2.0 RColorBrewer_1.1-3
[7] tools_4.2.0 lifecycle_1.0.1 tibble_3.1.6 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2
[13] Matrix_1.4-1 igraph_1.3.1 DBI_1.1.2 cli_3.3.0 ggrepel_0.9.1 parallel_4.2.0
[19] gridExtra_2.3 stringr_1.4.0 withr_2.5.0 dplyr_1.0.9 generics_0.1.2 vctrs_0.4.1
[25] grid_4.2.0 tidyselect_1.1.2 glue_1.6.2 ellipse_0.4.2 R6_2.5.1 fansi_1.0.3
[31] rARPACK_0.11-0 BiocParallel_1.30.0 tidyr_1.2.0 reshape2_1.4.4 purrr_0.3.4 corpcor_1.6.10
[37] magrittr_2.0.3 scales_1.2.0 ellipsis_0.3.2 matrixStats_0.62.0 assertthat_0.2.1 colorspace_2.0-3
[43] utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 crayon_1.5.1
`
π€ Expected behavior:
One would expect LOOCV to start and happen for the tuning of the features.
π‘ Possible solution:
None, sorry :-)