Check indexing arg types

ilan-gold commented 6 months ago

Please describe your wishes and possible alternatives to achieve the desired result.

Working on https://github.com/scverse/anndata/issues/1224, we discovered that np.where is called internally by scipy on boolean masks, but there are places in our code where we also call it, likely unnecessarily. It would be good to review these places to see if they are still necessary and then see if there is an effect on performance.

ivirshup commented 6 months ago

My main concern here is whether doing adata[adata.obs[...] == ...].X will hit the optimized path for indexing if X is backed (since we only optimized for mask indexing, and not positional)

ivirshup commented 6 months ago

It does not, which kind of defeats the optimization. This now needs doing.

Confirmed by adding a print(row, col) to BaseCompressedSparseDataset.__getitem__ and running:

import anndata as ad, numpy as np, zarr
from scipy import sparse

rng = np.random.default_rng()

g = zarr.open()
X = sparse.random(10_000, 1_000, density=0.01, format="csr", random_state=rng)
ad.experimental.write_elem(g, "X", X, dataset_kwargs={"chunks": 1_000})
X_backed = ad.experimental.sparse_dataset(g["X"])

bool_idx = np.zeros(X.shape[0], dtype=bool)
bool_idx[5000:] = True
bool_idx[:1000] = True

adata = ad.AnnData(X=X_backed)
adata[bool_idx].X

[   0    1    2 ... 9997 9998 9999] slice(None, None, None)

<6000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 59999 stored elements in Compressed Sparse Row format>

flying-sheep commented 6 months ago

slight aside: np.where is designed for np.where(cond, vals_if_true, vals_if_false), using np.where(cond)[0] as replacement for np.flatnonzero(cond) is awkward.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

flying-sheep commented 1 month ago

@ilan-gold what’s missing here?

scverse / anndata

Check indexing arg types #1293

Please describe your wishes and possible alternatives to achieve the desired result.