Closed bkmartinjr closed 1 year ago
Double slicing in backed mode (taking a view of a view in backed mode) is not allowed and we now throw an error: https://github.com/theislab/anndata/commit/2b622f4518d670c3cddd3a861a1a718d13575c15
Why don't you do adata_backed[0:2, 0:2]
?
We are using boolean slicing to allow for complex filtering, and currently double slicing with non-integer or non-slice selectors is not allowed.
In other words, this throws an error:
obs_selector=np.array([True, False, ...])
vars_selector=np.array(False, True, ...])
adata_backed[obs_selector, vars_selector]
And the error message informs that we should try double slicing.
OK! Right, double-slicing in memory mode works fine and is currently the only (not nice) way to get submatrices from boolean vectors. In backed-mode, it's quite a bit trickier.
In any-case: if you need this, I'll implement the functionality, maybe even tonight. Then no double slicing is necessary anymore.
At the moment, we can work around it and don't see a need for you to urgently implement. I ran into the bug because I was benchmarking to determine optimal ways to use anndata in cellxgene. I think the best path would be for us to ship our "MVP", and then have a chat with you about performance. Backed mode will either be useful, or not, based upon that. Seem reasonable?
Sounds very reasonable! Let's discuss!
Meanwhile, I think the submatrix extraction via slicing should be relatively straightforward to get via np.ix_()
applied to the data matrix and everything else stays as is. As we discussed this already ages ago and you worked quite a bit on the indexing at the time, @flying-sheep, any bandwidth for doing this? It's essentially only making sure that the index normalization produces non-slices and handles pd.Index
objects appropriately.
Progress was definitely made here, but I'm not sure this issue is totally solved. Double "fancy" indexing over multiple axes isn't supported by h5py datasets. This does work with backed anndata sparse matrices (at least on master).
Side note: It might be possible for zarr dense arrays via get_orthogonal_selection
.
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!
We throw a meaningful error here and if we ever start supporting it, we’ll announce it.
Test case:
Traceback of final line: