yixuan / RSpectra

R Interface to the Spectra Library for Large Scale Eigenvalue and SVD Problems
http://cran.r-project.org/package=RSpectra
80 stars 12 forks source link

Warm start in svds() #6

Closed privefl closed 5 years ago

privefl commented 7 years ago

Basically, I have a matrix of thousands of samples and say 100,000 variables. Using your package make it easy and fast to get the 10 first singular values/vectors. I have an algorithm that iteratively (3 or 4 times) discards 1000 variables and I have to recompute the svd each time. In other words, I compute 4 times very similar singular vectors and I would like to know if it could be possible to use "warm starts" for the second, third and fourth iterations in order to make them way faster?

Thanks for you work.

yixuan commented 7 years ago

Yeah, this is one issue I was considering to solve, as was mentioned in https://github.com/yixuan/spectra/issues/3. However currently I don't find a good algorithm to make use of the warm starts. If you happen to know some theoretical results on this, just let me know. Thanks.

privefl commented 7 years ago

I didn't study well the algorithm but doesn't it use some random vectors at the start? In an ideal world, I thought that using warm starts would just consists in using previous resulting (left?right?) singular vectors rather than some random vectors.

If you don't have time to look into this, please provide some tips where I can go look into to implement this behaviour. I really need this to develop new methods that I want to publish.

privefl commented 5 years ago

I'm still highly interested in this feature. Any tip on this?

yixuan commented 5 years ago

@privefl From what I have learned, it seems the Krylov space method that Spectra is using only takes one initial vector. This may not be good for implementing the warm start for SVD or eigen decomposition.

A promising algorithm to make use of existing knowledge is the Jacobi-Davidson method, which is implemented in the PRIMME package. In fact its svds() function already has this feature. Maybe you can have a try?

privefl commented 5 years ago

This looks very promising indeed! Yet, I can't install the R package (some missing #include "primme.h"). I'll look at it in details next week, thanks.

yixuan commented 5 years ago

It is already on CRAN. Any problem during the installation?

privefl commented 5 years ago

Yes, I tried

library(devtools)
install_github("primme/primme", subdir="R")
yixuan commented 5 years ago

What about directly install.packages("PRIMME")?

privefl commented 5 years ago

CRAN installation works fine, thanks.

privefl commented 5 years ago

Just for your information, I tested it with this use-case: Matrix of 1500 x 50K. I drop 500 variables and want to recompute SVD. I tried providing both u and v (subsetted) from previous iteration, and only one of them. It was always a bit slower than not providing any starting guess.

However, {PRIMME} is about 1.5x as fast as {RSpectra} (for my use-case, and maybe just a different interpretation of the tolerance parameter).

yixuan commented 5 years ago

Yeah. The result seems to be consistent with the experiments here. RSpecra tends to use more iterations to achieve higher accuracy, and it would be interesting to also compare the residual errors in your case. For myself I always did some pilot benchmarking on my data to determine which package to use. :joy:

And the convergence of eigenvalue computation is really... mysterious.

privefl commented 5 years ago

The accuracy is better for {RSpectra} 1e-9 vs 1e-6 I think. Cf. https://github.com/privefl/bigsnpr/blob/bedpca/tmp-tests/bench-PRIMME2.R.

I should look at this preconditioner thing, if I can use it to fasten by iterative procedure or not.