Rank-one updates and other potential performance gains for CUR

scikit-learn-contrib / scikit-matter

A collection of scikit-learn compatible utilities that implement methods born out of the materials science and chemistry communities

BSD 3-Clause "New" or "Revised" License

70 stars 18 forks source link

This is a revive of the draft PR https://github.com/scikit-learn-contrib/scikit-matter/pull/86 (please look into it for further information) because I think it is worth to look into this more given that CUR outperforms FPS by far in regression quality and is often not used because it is so expensive to compute.

The core idea is to update the eigenvectors after a selection instead of recomputing them by an eigendecomposition. @ceriottm mentioned in a discussion that it was mathematically unstable for eigenvectors corresponding to degenerated eigenvalues. So this deserves some dedicated time look into this in detail.

Links:

please look at the links of the closed PR draft
Math explaining the core idea https://math.stackexchange.com/a/3625609
LAPACK function that might be required https://www.netlib.org/lapack/explore-html/d2/d24/group__aux_o_t_h_e_rcomputational_ga3c4a943599132aea3ac964c08392853a.html
I am not aware of any python bindings of LAPACK function but there exists a similar function in scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lapack.dlasd4.html

scikit-learn-contrib / scikit-matter

Rank-one updates and other potential performance gains for CUR #216