Open ilan-gold opened 11 months ago
Previous discussion of scipy.sparse.*_array
classes:
I think there are other good reasons to move away from inheritance for these functions. The code in scipy is inflexible, and assumes in-memory objects. This generally does not fit our usage patterns and can lead to very poor performance (e.g. loading an entire array into memory).
The relevant changes in sparse array semantics are:
csr_array
instead of csr_matrix
scipy.sparse.sparray
subclasses, but should return a 1d coo array in the next release of scipy (~ four months from now)There's a question of compatibility with current support. I would suggest that most of the behavior is the same, so we could wrap the new implementation in a class that returns matrices, and implements 1d indexing.
One other point about returning different array types, we'd ideally also like to be able to return cupyx.scipy.sparse
arrays.
Maybe we need something like zarr
s meta_array
here?
Hi, it seems sparse arrays are deprecating .getnnz
which is used in Scanpy and Muon. If moving to sparse arrays is on the roadmap for AnnData it could be worth checking it doesn't break things there! https://github.com/scverse/scanpy/issues/2773
As an update, I believe getnnz
is being un-deprecated as there hasn't been a replacement proposed.
Writing up a discussion with @ivirshup:
Scipy is moving away from their matrix API towards the array API (see note here). Consequently, or perhaps semi-coincidentally, the internal functions whose signature/title/behavior on which we rely is going to be even more in flux (e.g.,
_get_intXslice
inbacked
mode classes).For this reason it perhaps makes sense to move away from inheriting from scipy's internal classes and towards our own wrappers. While this may sound like more code maintenance, two things will probably put a finite limit on how much we have to do:
getSliceXInt
oncsr_matrix
. People probably should not be doing this inbacked
mode anyway, and so we can probably safely throw an error.xxx_matrix
classes directly from scipy at the moment. Thus throwing an error or changing behavior should not be too much of an issue, with the exception of the return type.Therefore moving forward we should probably investigate
XXXDataset
) to be array instead of matrix