Open gokceneraslan opened 5 years ago
That sum behavior is real weird. I'm pretty surprised numpy doesn't throw an error about that one. I'd go as far as to rename this issue to be about that, since I know filtering sparse backed datasets probably won't work due to how subsetting is defined (it has a call to .copy()
, which should throw an error for any backed dataset).
Did calling sum work before we had the big scipy update?
See also theislab/scanpy#650
This operation is poorly defined for our current backed mode.
What would the expected result be? Does it return a view in backed mode? Do we need to write a new copy of the object somewhere?
For the time being, we should at least raise a better error message than the technical one people get now. Maybe just raise NotImplementedError('filter_genes does not yet(?) work in backed mode')
Maybe with your error you can give the suggestion mentioned here: https://github.com/theislab/scanpy/issues/650#issuecomment-499492569
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!
Right now
sc.pp.filter_genes(adata.X, min_counts=10)
works correctly for dense file-backed anndata sincenp.sum(adata.X)
here is supported by h5py.Dataset (see https://github.com/h5py/h5py/blob/master/h5py/_hl/dataset.py#L762).However the same call on a SparseDataset fails with the following error:
because
np.sum(adata.X)
weirdly returns a SparseDataset without reducing any dimensions:I can think of two solutions:
Set the
__array_ufunc__
method of SparseDatset (https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array_ufunc__) toNone
which would state explicitly that numpy operations are not supported. Next, we can print a proper error in filter_cells/filter_genes which states such operations are not supported.Implement a
sum
function (which loads the data into memory) which would enable us to perform gene/cell filtering options. We can print a warning here and inform the user that data is loaded to the memory. Something like following works: