scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 602 forks source link

Highly variable genes for sparse dataset in backed mode #2764

Closed siberianisaev closed 8 months ago

siberianisaev commented 12 months ago

Please make sure these conditions are met

What happened?

The exception happened when try to run scanpy highly_variable_genes with sparse dataset loaded in backed mode

Minimal code sample

# read backed
adata = anndata.read_h5ad(file_path, backed='r')
X = adata.raw.X if adata.raw is not None else adata.X
# dataset must be sparse there
print(issparse(X[0]))
# calculate dispersions
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, inplace=False)

True


### Error output

```pytb
loop of ufunc does not support argument 0 of type SparseDataset which has no callable expm1 method!

goes from https://github.com/scverse/scanpy/blob/bc349b999be62196aa51b59db6e2daa37f428322/scanpy/preprocessing/_highly_variable_genes.py#L206

Versions

``` anndata 0.8.0 scanpy 1.9.1 ----- PIL 9.3.0 appnope 0.1.3 asttokens NA backcall 0.2.0 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.6.3 decorator 5.1.1 entrypoints 0.4 executing 1.2.0 google NA h5py 3.7.0 igraph 0.10.2 ipykernel 6.17.1 jedi 0.18.2 joblib 1.2.0 kiwisolver 1.4.4 leidenalg 0.9.0 llvmlite 0.39.1 louvain 0.8.0 matplotlib 3.6.2 mpl_toolkits NA natsort 8.2.0 numba 0.56.4 numpy 1.23.5 packaging 21.3 pandas 1.2.1 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 2.5.4 plotly 5.11.0 prompt_toolkit 3.0.33 psutil 5.9.4 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.8.0 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.13.0 pyparsing 3.0.9 pytz 2022.6 scipy 1.9.3 session_info 1.0.0 setuptools 62.3.2 sitecustomize NA six 1.16.0 sklearn 1.1.3 stack_data 0.6.1 texttable 1.6.6 threadpoolctl 3.1.0 tornado 6.2 traitlets 5.5.0 typing_extensions NA wcwidth 0.2.5 yaml 6.0 zipp NA zmq 24.0.1 ----- IPython 8.6.0 jupyter_client 7.4.7 jupyter_core 5.0.0 ----- Python 3.9.13 (main, May 24 2022, 21:28:31) [Clang 13.1.6 (clang-1316.0.21.2)] macOS-13.4-x86_64-i386-64bit ----- ```
ivirshup commented 8 months ago

I don't think we're going to get this implemented for sparse dataset per-se, but we have implemented this for dask arrays wrapping the sparse dataset in