refunders / refund

Regression with functional data
39 stars 23 forks source link

fosr error: vector memory exhausted (limit reached?) #86

Closed julia-wrobel closed 5 years ago

julia-wrobel commented 6 years ago

Memory error is occurring on datasets that should be small enough to work with the function. The error message and traceback() are below:

`> model_fosr = fosr(Y = Y_mat, X = covars_mat) Finding optimal lambda by optimize()... Error: vector memory exhausted (limit reached?)

traceback() 6: tcrossprod(scale(QU, center = FALSE, scale = 1 + lam * svd211$d), QU) 5: f(arg, ...) 4: (function (arg) f(arg, ...))(3.22223828777439) 3: optimize(cvfcn, c(0, maxlam), tol = 0.01) 2: lofocv(respmat, X.sc %x% Bmat, S1 = pen[[1]], argvals = argvals, lamvec = lambda, constr = constr, maxlam = maxlam) 1: fosr(Y = Y_mat, X = covars_mat)`

Here is some code that will reproduce the error:

`library(tidyverse) library(refund)

set.seed(1988) dat = pffrSim(n = 300, nygrid = 200, scenario = "int")

covars_df = data.frame( intercept = rep(1, 300), age = runif(300, 20, 80), height = rnorm(300, 68), weight = rnorm(300, 150) )

covars_mat = as.matrix(covars_df) Y_mat = as.matrix(dat$Y)

model_fosr = fosr(Y = Y_mat, X = covars_mat)`

fabian-s commented 6 years ago

Thx @julia-wrobel for nailing down the offending line.

QU has dimensions (#observations * #timepoints) X (#basisfunctions of all effects combined).

For the example, QU is 60000 x 60, so that tcrossprod tries to create a dense 60000 x 60000 matrix, which isn't such a terrific idea ... :hankey:

I wonder how that really ever worked for reasonably sized data? lofocv hasn't been modified for 3 years so that can't be a new failure mode despite what Vadim wrote initially about seeing this problem only recently.

the subsequent line needs this big ass hat-matrix to compute the residuals for that specific lambda, so the only easy fix i see is changing

if (length(lambda) != 1 | cv1) {

in fosr.R to

if (length(lambda) > 1 | cv1) {

(which might have been intended in any case?) so that ill-conceived lofocv function never gets called for the default spec lambda = NULL and we immediately dispatch to amc which calls mgcv and can handle biggish data sets....

gotta run now, but if @philreiss doesn't object and you have time to check that this does not do anything else that's terrible in the dev branch that would the best move to make Vadim & co happy again I think.

julia-wrobel commented 6 years ago

There may be deeper issues. After changing the line @fabian-s suggested I get the same Error: vector memory exhausted (limit reached?), but with a different traceback:

traceback() 5: outer(X, Y, FUN, ...) 4: .kronecker(X, Y, FUN = FUN, make.dimnames = make.dimnames, ...) 3: kronecker(X, Y) 2: diag(ncurve) %x% covmat 1: fosr(Y = Y_mat, X = covars_mat)

@fabian-s do you have any other ideas for patches?

fabian-s commented 6 years ago

i think this is the same issue (trying to create a huge quadratic matrix with ncurve*gridlength columns) at a different code location.... since it is blockdiag, maybe we could just do the operation blockwise? travelling atm, can't really look at the code

fabian-s commented 5 years ago

@julia-wrobel do you have capacity to test the patch I made on this? (PR #88)

julia-wrobel commented 5 years ago

@fabian-s Examples from the fosr documentation yield the same results in both the patched version and the current master branch. However for the pathological example patched code no longer produces the error "Error: vector memory exhausted (limit reached?)", but produces a set of warnings: "In optimize(cvfcn, c(0, maxlam), tol = 0.01) : NA/Inf replaced by maximum positive value".

Is this something we're willing to tolerate?

fabian-s commented 5 years ago

model results look very reasonable to me, so I guess we can live with that.

fabian-s commented 5 years ago

thx, @julia-wrobel