Enhancement/normalization dask

PR Checklist

[x] This comment contains a description of changes (with reason)
[x] Referenced issue is linked: Fixes #762
[x] If you've fixed a bug or added code that should be tested, add tests!
[x] Documentation in docs is updated

Description of changes

Allow normalization methods to work with (dense) dask array. Suggest ehrapy[dask] for dependency management. Might make dask a dependency in the future.

Technical details

Moved from the scikit-learn (e.g. sklearn.preprocessing.scale) functions to the scikit-learn classes (e.g. sklearn.preprocessing.StandardScaler) for preprocessing to be more synced with dask-ml (only has the classes option, e.g. dask_ml.preprocessing.StandardScaler). No user facing effects, updated doc.
Added dask as optional dependency in pyproject.toml, and as mandatory dependency for test
Added dask-ml preprocessing classes, also doc updated, which allow lazy computation of dask arrays (No MaxAbsScaler, no PowerTransformer here, Error will be raised)
Added dask to test, following a style similar as scanpy is currently using it
Significant performance upgrade especially in dask case by vectorizing a for loop in the scaling function
Extensive testing, also refactored into more fine-grained tests to avoid overloading & getting hard-to-decipher error messages.

Additional context

Example, profiled with scalene (run below python scripts as scalene <scriptname>.py, demonstrating how ep.pp.scale_norm does not trigger the computations and is not performance bottleneck:

In memory (numpy) array

import scalene
scalene.scalene_profiler.stop()
import pandas as pd
from sklearn.datasets import make_blobs as make_blobs
import ehrapy as ep
import anndata as ad
import scanpy as sc
n_individuals = 50000
n_features = 1000
n_groups = 4
chunks = 1000
data_features, data_labels = make_blobs(n_samples=n_individuals, n_features=n_features, centers=n_groups, random_state=42)
var = pd.DataFrame({"feature_type": ["numeric"] * n_features})
adata = ad.AnnData(X=data_features, obs={"label": data_labels}, var=var)
scalene.scalene_profiler.start()
ep.pp.scale_norm(adata)
ep.pp.pca(adata)
sc.pp.neighbors(adata)
ep.tl.leiden(adata)
ep.pl.pca(adata, color="leiden", save="profiling_memory_pca.png")
scalene.scalene_profiler.stop()

memory_profile_50000x1000

Out-of-memory (dask) array

import scalene
scalene.scalene_profiler.stop()
import dask.array as da
from sklearn.datasets import make_blobs as make_blobs
import ehrapy as ep
import anndata as ad
import pandas as pd
import scanpy as sc
n_individuals = 50000
n_features = 1000
n_groups = 4
chunks = 1000
data_features, data_labels = make_blobs(n_samples=n_individuals, n_features=n_features, centers=n_groups, random_state=42)
data_features = da.from_array(data_features, chunks=chunks)
var = pd.DataFrame({"feature_type": ["numeric"] * n_features})
adata = ad.AnnData(X=data_features, obs={"label": data_labels}, var=var)
scalene.scalene_profiler.start()
ep.pp.scale_norm(adata)
ep.pp.pca(adata)
adata.obsm["X_pca"] = adata.obsm["X_pca"].compute()
sc.pp.neighbors(adata)
sc.tl.leiden(adata)
sc.pl.pca(adata, color="leiden", save="profiling_out_of_core_pca.png")
scalene.scalene_profiler.stop()

out_of_core_profile_50000x1000

theislab / ehrapy

Enhancement/normalization dask #763