Closed jolars closed 2 years ago
It sometimes works quite well, othertimes makes no difference.
There is of course a problem in that we have to store an object that's possibly as large as $X$, but since this can easily be turned off I don't see a problem.
It does not seem to help with large sparse matrices. I think having in memory the X_reduced of X.shape size hurts too much
It does not seem to help with large sparse matrices. I think having in memory the X_reduced of X.shape size hurts too much
Me and @JonasWallin discussed this, and I think that if we want to do it for sparse matrices we need to keep X_reduced sparse, but we cannot keep X_reduced
as a regular sparse matrix (CSC). We'd need store each reduction as two arrays, one of the nonzero indices in and values in another. It should be doable, but I'm not sure how efficient it will end up being in the end.
I've removed the caching of updates for the sparse case for the time being. If we want to include this we'll have to come up with something that maintains sparsity. What we would want to do is to keep a list where each object consists of a list of indices and a list of values for each of the reductions and update that, but that won't work with numba and I'm not sure that even having two separate lists (one with indices and one with values) works with numba, so I'm not completely sure it's worth pursuing.
This PR caches $X_C s_C$ and $(X_C s_C)^T (X_C s_C)$, updating them whenever is needed. It currently only works for dense $X$, but me and @JonasWallin has discussed that is should be possible to do this for sparse $X$ too. But I've spent way too much time on this already, so if anyone else is willing to champion this then please go ahead.