Possible use of sparse matrices

pbreheny / ncvreg

Regularization paths for SCAD- and MCP-penalized regression models

http://pbreheny.github.io/ncvreg

41 stars 28 forks source link

Possible use of sparse matrices #3

Open svazzole opened 8 years ago

svazzole commented 8 years ago

Dear Patrick, first of all thanks for this package. Do you think it would be possible to add the support for sparse matrices (from the Matrix package)?

pbreheny commented 8 years ago

Do you mean like this?

i <- c(1,3:8); j <- c(2,9,6:10); x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)
y <- rnorm(8)
ncvreg(X, y)

I agree, that should be supported directly, and will be easy to fix. In the meantime, this should suffice as a workaround:

ncvreg(as.matrix(X), y)

Or am I not understanding your question?

svazzole commented 8 years ago

Yes that is exactly what I meant. I will try to explain better my problem. I'm developing a package (called sparsevar) to estimate sparse VAR models using penalized least squares methods (LASSO, SCAD and MC+). When using LASSO (and in particular the glmnet package) I can use sparse matrices for storing the data: this results in a lot less RAM consumption and faster computation. Just to give you an idea with LASSO, one can estimate a 200x200 VAR(1) model in 4 minutes. When using SCAD and MC+ (using this package), one must limit to a 50x50 VAR(1) model. On the other side SCAD and MC+ give way better results in the estimation of the matrices (better accuracy, smaller norm error and RMSE). So I was wondering if it would be possible (in your opinion), to use the Matrix package (and in particular the sparseMatrix class) directly in the ncvreg function used for the estimation. Thanks for your attention, Simon

pbreheny commented 8 years ago

Now I'm confused. You want to pass a sparse design matrix X into ncvreg, or you want ncvreg to use sparse matrices internally for beta, as glmnet does? If it's the former, I'm having a hard time seeing how that would reduce RAM usage much, since even if the original is sparse, the standardized design matrix wouldn't be.

svazzole commented 8 years ago

I want to pass a sparse matrix to ncvreg. The design matrix that I'm passing it's a Kronecker product. Something like

x <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
I <- Matrix::Diagonal(2)
X <- Matrix::kronecker(I,x)

So X is a sparse block diagonal matrix. I've tried using the standard matrix also when using LASSO and changing to the sparse one reduces a lot the RAM usage (and increases also the execution speed).

pbreheny commented 8 years ago

OK, I understand now. Yes, I think this would be a nice improvement to ncvreg, although unfortunately, it will require some work -- disregrard my earlier comment about "easy to fix", that was assuming you were just talking about an interface, not changing the internal workings of ncvreg. I'm labeling this an "enhancement" and leaving the issue open, although if I'm being honest, I wouldn't expect this feature to be added any time soon. (not because I don't think it's important, just because there are only so many hours in the day)

svazzole commented 8 years ago

I suspected it was something not so easy to fix. I've forked the repo to see if I can work on it. Thanks, Simon

pbreheny commented 8 years ago

Well, if you get it working, I'd be happy to merge it in. I suspect the most time-consuming part will be rewriting the C-level code.

swzCuroverse commented 5 years ago

Hello all -

I would also like to echo that I would like to see a spare support of this method.

tomwenseleers commented 12 months ago

Yes would be very useful - packages like glmnet and abess do support sparse covariate matrices, but would be good to have that in ncvreg too. If the package would use Rcpp and Eigen or Armadillo classes this would be a lot easier to adapt...