Open fataltes opened 2 months ago
Hey @fataltes!
Thank you for raising this issue. This problem is related to issues with the R svd
function, but some people managed to find a fix in other packages relying on the SVD. I tried replicating their fix in https://github.com/rgcca-factory/RGCCA/tree/fix_lapack_error_with_svd. Can you try out the branch to see if it solves your problem?
Best, Fabien
Thanks for your quick response @GFabien,
I switched to that branch and now the error changed to a NAN case error. Here is the traceback:
Error in if (any(x$a != 0)) {: missing value where TRUE/FALSE needed
Traceback:
1. system.time(ncomp_cv_out <- ncomp_cv(blocks, grid_search_matrix,
. method))
2. ncomp_cv(blocks, grid_search_matrix, method)
3. rgcca_cv(blocks, response = 4, method = cur_method, par_type = "ncomp",
. par_value = grid_search_matrix, init = "rand", tau = 0.5,
. prediction_model = "glmnet", metric = "F1", k = 3, n_run = 5,
. verbose = TRUE, quiet = FALSE) # at line 4-14 of file <text>
4. par_pblapply(idx, function(n) {
. i <- (n - 1)%/%length(v_inds) + 1
. j <- (n - 1)%%length(v_inds) + 1
. rgcca_cv_k(rgcca_args, inds = v_inds[[j]], metric = metric,
. par_type = param$par_type, par_value = param$par_value[i,
. ], prediction_model = model$prediction_model, ...)
. }, n_cores = n_cores, verbose = verbose)
5. pbapply::pblapply(X, FUN, ..., cl = cl)
6. lapply(X, FUN, ...)
7. FUN(X[[i]], ...)
8. rgcca_cv_k(rgcca_args, inds = v_inds[[j]], metric = metric, par_type = param$par_type,
. par_value = param$par_value[i, ], prediction_model = model$prediction_model,
. ...)
9. do.call(rgcca, rgcca_args)
10. (function (blocks, connection = NULL, tau = 1, ncomp = 1, scheme = "factorial",
. scale = TRUE, init = "svd", bias = TRUE, tol = 1e-08, verbose = FALSE,
. scale_block = "inertia", method = "rgcca", sparsity = 1,
. response = NULL, superblock = FALSE, NA_method = "na.ignore",
. quiet = TRUE, n_iter_max = 1000, comp_orth = TRUE, A = NULL,
. C = NULL)
. {
. if (!missing(A)) {
. warning("Argument A is deprecated, use blocks instead.")
. blocks <- A
. }
. if (!missing(C)) {
. warning("Argument C is deprecated, use connection instead.")
. connection <- C
. }
. rgcca_args <- as.list(environment())
. tmp <- get_rgcca_args(blocks, rgcca_args)
. opt <- tmp$opt
. rgcca_args <- tmp$rgcca_args
. rgcca_args$quiet <- quiet
. rgcca_args$verbose <- verbose
. blocks <- remove_null_sd(rgcca_args$blocks)$list_m
. if (opt$disjunction) {
. blocks[[rgcca_args$response]] <- as_disjunctive(blocks[[rgcca_args$response]])
. }
. tmp <- handle_NA(blocks, NA_method = rgcca_args$NA_method)
. na.rm <- tmp$na.rm
. blocks <- scaling(tmp$blocks, scale = rgcca_args$scale, bias = rgcca_args$bias,
. scale_block = rgcca_args$scale_block, na.rm = na.rm)
. if (rgcca_args$superblock) {
. blocks[["superblock"]] <- Reduce(cbind, blocks)
. colnames(blocks[["superblock"]]) <- paste0("s-", colnames(blocks[["superblock"]]))
. }
. gcca_args <- rgcca_args[c("connection", "ncomp", "scheme",
. "init", "bias", "tol", "verbose", "superblock", "response",
. "n_iter_max", "comp_orth")]
. gcca_args[["na.rm"]] <- na.rm
. gcca_args[["blocks"]] <- blocks
. gcca_args[["disjunction"]] <- opt$disjunction
. gcca_args[[opt$param]] <- rgcca_args[[opt$param]]
. func_out <- do.call(rgcca_outer_loop, gcca_args)
. func_out <- format_output(func_out, rgcca_args, opt, blocks)
. class(func_out) <- "rgcca"
. invisible(func_out)
. })(tau = c(0.5, 0.5, 0.5, 0), tol = 1e-08, init = "random", bias = TRUE,
. quiet = TRUE, scale = TRUE, ncomp = c(1, 1, 319, 319), blocks = list(
.....
The error is apparently coming from here: https://github.com/rgcca-factory/RGCCA/blob/eefad3ba95cfbf285ceabcdd2fdd7c0329d77f7b/R/block_project.R#L28
Before the error I also get this warning:
Missing colnames are automatically labeled.
Missing rownames are automatically labeled.
Warning message in sqrt(t(x$alpha) %*% x$M %*% x$K %*% x$alpha):
“NaNs produced”
Which I think is coming from here: https://github.com/rgcca-factory/RGCCA/blob/eefad3ba95cfbf285ceabcdd2fdd7c0329d77f7b/R/block_project.R#L40
I'd appreciate if you have any suggestions or updates on the code for me to try.
Thanks, Fatemeh
Hi Fatemeh,
What I guess is that t(x$alpha) %*% x$M %*% x$K %*% x$alpha
produces a negative value so the square root creates a NaN. From a theoretical point of view, the matrix x$M %*% x$K
is symmetric and positive semi-definite, so the issue should not happen. However, it might be possible that due to numerical imprecision, the computed matrix has close to zero negative eigenvalues, which should correspond to zero eigenvalues of the theoretical x$M %*% x$K
matrix.
Anyway, it means that a = t(x$x) %*% alpha
is close to zero so the block does not contribute much to the objective function. I would say that you can reduce the number of components extracted for this particular block to remove the error.
A fix of the code might be to check on the value t(x$alpha) %*% x$M %*% x$K %*% x$alpha
to see if it is smaller or equal to zero and set alpha to zero if so. To further investigate, I would need a reproducible example, would you be able to create one?
I hope this helps!
Best, Fabien
Hi @GFabien ,
Thanks for the response!
Unfortunately, I cannot share the data and am trying to both apply your suggestion and generate a case that reproduces the issue on some random data.
But if you can share some contact, maybe I can follow up with you on a 1:1 meeting and then you can update the repo and issue here with the solution.
Thank you, Fatemeh
Hi @GFabien , That would be great if you can give me some point of contact to reach out for the 1:1 meeting to go over the data and any suggested modifications/reruns by you.
Hi @fataltes, Sorry for the late reply, I sent you an email to set up a 1:1 meeting. Best, Fabien
Hi @GFabien ,
Thanks for updating the code. It now gives me another error which indicates x$alpha has NAN itself. I don't know if it is possible or what should one do in such a case.
Thanks, Fatemeh
Hi @fataltes,
I updated the branch again, you shouldn't have NAN problems anymore. Please tell me if there are new errors or if it is fine now.
Best, Fabien
Hi @GFabien ,
Thank you, I tested the code. It actually runs but never finishes. On a fairly small dataset of size 1000*1050 with three imbalanced blocks, RGCCA_cv has been running for over 10 hours. I don't think this is expected. right? Could there be an infinite loop or such a thing happening?
Thank you
Hmmm... There shouldn't be. Did you run with verbose = TRUE
? If so did the progress bar stay at 0%?
Is it working if you run RGCCA without cross validation but with the same parameters?
Hi @fataltes, How are you? Do you still face the problems you reported? Best, Fabien
Hi @GFabien , Unfortunately yes. I still face the issue and there is another problem which I'm facing: It takes a loong time to run RGCCA on a small dataset if I activate parallelization on R.
Hi,
Running rgcca_cv with the following options:
I get the following error:
What do you think can be causing this and if you have any suggestions to try?
Thank you, Fatemeh