scikit-learn-contrib / scikit-matter

A collection of scikit-learn compatible utilities that implement methods born out of the materials science and chemistry communities
https://scikit-matter.readthedocs.io/en/v0.2.0/
BSD 3-Clause "New" or "Revised" License
70 stars 18 forks source link

Rank-one updates and other potential performance gains for CUR #216

Open agoscinski opened 10 months ago

agoscinski commented 10 months ago

This is a revive of the draft PR https://github.com/scikit-learn-contrib/scikit-matter/pull/86 (please look into it for further information) because I think it is worth to look into this more given that CUR outperforms FPS by far in regression quality and is often not used because it is so expensive to compute.

The core idea is to update the eigenvectors after a selection instead of recomputing them by an eigendecomposition. @ceriottm mentioned in a discussion that it was mathematically unstable for eigenvectors corresponding to degenerated eigenvalues. So this deserves some dedicated time look into this in detail.

Links:

ceriottm commented 10 months ago

Ok I thought a bit more and I think I recall better what the issue was.

  1. these are some algorithms for rank-1 updates http://doi.org/10.1137/S089547989223924X and http://doi.org/10.1007/BF01396012 - you can see it's kind of a pain to get the eigenvectors
  2. the issue here is that for a big matrix we don't want to compute ALL eigenvectors, but only the top ones. this can be done efficiently, but the problem is then how to do a rank-one update of only some of the eigenvectors. this is the only article I had found for this, but couldn't find an implementation https://epubs.siam.org/doi/epdf/10.1137/18M1172120