pbreheny / biglasso

biglasso: Extending Lasso Model Fitting to Big Data in R
http://pbreheny.github.io/biglasso/
113 stars 29 forks source link

Sparse matrices #4

Open dselivanov opened 7 years ago

dselivanov commented 7 years ago

Great work. Is it possible to extend package to allow sparse matrices as input?

YaohuiZeng commented 7 years ago

Thanks for your comment. Yes, that is definitely on my to-do list. I am currently busy with research papers. Hopefully I will be able to add this feature by the end of next month.

dselivanov commented 7 years ago

Also it would be great to have benchmarks against SGD based optimizations. Here https://github.com/dselivanov/FTRL I implemented FTRL algorithm. I found it blazing fast - both convergence speed and runtime per number of non-zero elements in input matrix.

hutaohutc commented 7 years ago

Nice work!How can I convert a sparse matrix to a big.matrix?

privefl commented 7 years ago

@hutaohutc Your question is more a bigmemory question than a biglasso one.

privefl commented 7 years ago

@hutaohutc As I said, your question will get more attention posting an issue in the bigmemory repo or by asking on Stack Overflow (with tag r-bigmemory).

mm3509 commented 6 years ago

@YaohuiZeng I am also interested in having the package take sparse matrices as an input, as currently my matrix is too big to fit in disk space.

privefl commented 6 years ago

What's the kind of data you have? Dimension? If your data is very sparse, it means that you have variables with very low variability. Maybe try to identify only columns with some decent variability and put those in a big.matrix.

What do you think?

mm3509 commented 6 years ago

I want to estimate a vector auto-regression (VAR) on 10 years (120 months) of US county-level data. I have 120 time periods * 3,118 mainland counties = 374,160 data points. If I were estimating the VAR county by county, I would select one county, run a LASSO regression on all the others, and obtain 3,118 ^ 2 parameters (3,118 parameters for each county on the left-hand side). But I want to estimate it in one go, so I have a matrix of explanatory variables with 1 million regressors, which weighs about 27 TB (although it's 99.997% sparse). The reason to estimate it in one go is that later I want to impose a certain structure on the variance-covariance matrix of errors and do a kind of Feasible Generalized Least Squares.

So, yes, I could find work-arounds to estimate some approximation of the VAR parameters. But I really want to run the whole thing at once...

TuSKan commented 6 years ago

@miguelmorin Take a look on this new package for estimate VAR models http://www.wbnicholson.com/BigVAR.html