Closed julia-wrobel closed 5 years ago
Thx @julia-wrobel for nailing down the offending line.
QU has dimensions (#observations * #timepoints) X (#basisfunctions of all effects combined).
For the example, QU is 60000 x 60, so that tcrossprod
tries to create a dense 60000 x 60000 matrix, which isn't such a terrific idea ... :hankey:
I wonder how that really ever worked for reasonably sized data? lofocv
hasn't been modified for 3 years so that can't be a new failure mode despite what Vadim wrote initially about seeing this problem only recently.
the subsequent line needs this big ass hat-matrix to compute the residuals for that specific lambda, so the only easy fix i see is changing
if (length(lambda) != 1 | cv1) {
in fosr.R to
if (length(lambda) > 1 | cv1) {
(which might have been intended in any case?) so that ill-conceived lofocv
function never gets called for the default spec lambda = NULL
and we immediately dispatch to amc
which calls mgcv
and can handle biggish data sets....
gotta run now, but if @philreiss doesn't object and you have time to check that this does not do anything else that's terrible in the dev branch that would the best move to make Vadim & co happy again I think.
There may be deeper issues. After changing the line @fabian-s suggested I get the same Error: vector memory exhausted (limit reached?), but with a different traceback:
traceback() 5: outer(X, Y, FUN, ...) 4: .kronecker(X, Y, FUN = FUN, make.dimnames = make.dimnames, ...) 3: kronecker(X, Y) 2: diag(ncurve) %x% covmat 1: fosr(Y = Y_mat, X = covars_mat)
@fabian-s do you have any other ideas for patches?
i think this is the same issue (trying to create a huge quadratic matrix with ncurve*gridlength columns) at a different code location.... since it is blockdiag, maybe we could just do the operation blockwise? travelling atm, can't really look at the code
@julia-wrobel do you have capacity to test the patch I made on this? (PR #88)
@fabian-s Examples from the fosr documentation yield the same results in both the patched version and the current master branch. However for the pathological example patched code no longer produces the error "Error: vector memory exhausted (limit reached?)", but produces a set of warnings: "In optimize(cvfcn, c(0, maxlam), tol = 0.01) : NA/Inf replaced by maximum positive value".
Is this something we're willing to tolerate?
model results look very reasonable to me, so I guess we can live with that.
thx, @julia-wrobel
Memory error is occurring on datasets that should be small enough to work with the function. The error message and traceback() are below:
`> model_fosr = fosr(Y = Y_mat, X = covars_mat) Finding optimal lambda by optimize()... Error: vector memory exhausted (limit reached?)
Here is some code that will reproduce the error:
`library(tidyverse) library(refund)
set.seed(1988) dat = pffrSim(n = 300, nygrid = 200, scenario = "int")
covars_df = data.frame( intercept = rep(1, 300), age = runif(300, 20, 80), height = rnorm(300, 68), weight = rnorm(300, 150) )
covars_mat = as.matrix(covars_df) Y_mat = as.matrix(dat$Y)
model_fosr = fosr(Y = Y_mat, X = covars_mat)`