scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

sc.tl.dendrogram no longer(?) works in backed mode #3199

Open flying-sheep opened 1 month ago

flying-sheep commented 1 month ago

Please make sure these conditions are met

What happened?

In #3048 we started raising errors for functions that don’t support backed mode, but seems like a tutorial used dendrogram in backed mode: https://scverse-tutorials.readthedocs.io/en/latest/notebooks/scverse_data_backed.html

grafik

That was probably a mistake and the data just got loaded to memory, but since dendrogram can be reimplemented using .get.aggregate, we should do that!

Minimal code sample

import scanpy as sc

adata = sc.datasets.pbmc3k()
adata.filename = "test.h5ad"
sc.pl.dotplot(adata, ["FCN1"], groupby="index", dendrogram=True)

Error output

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[44], line 1
----> 1 sc.pl.dotplot(mdata["rna"], var_names=["CD2"], groupby="leiden", figsize=(10, 3), dendrogram=True, swap_axes=True)

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/legacy_api_wrap/__init__.py:80, in legacy_api.<locals>.wrapper.<locals>.fn_compatible(*args_all, **kw)
     77 @wraps(fn)
     78 def fn_compatible(*args_all: P.args, **kw: P.kwargs) -> R:
     79     if len(args_all) <= n_positional:
---> 80         return fn(*args_all, **kw)
     82     args_pos: P.args
     83     args_pos, args_rest = args_all[:n_positional], args_all[n_positional:]

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/plotting/_dotplot.py:1046, in dotplot(adata, var_names, groupby, use_raw, log, num_categories, expression_cutoff, mean_only_expressed, cmap, dot_max, dot_min, standard_scale, smallest_dot, title, colorbar_title, size_title, figsize, dendrogram, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, swap_axes, dot_color_df, show, save, ax, return_fig, vmin, vmax, vcenter, norm, **kwds)
   1019 dp = DotPlot(
   1020     adata,
   1021     var_names,
   (...)
   1042     **kwds,
   1043 )
   1045 if dendrogram:
-> 1046     dp.add_dendrogram(dendrogram_key=dendrogram)
   1047 if swap_axes:
   1048     dp.swap_axes()

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/plotting/_baseplot_class.py:306, in BasePlot.add_dendrogram(self, show, dendrogram_key, size)
    302 self.group_extra_size = size
    304 # to correctly plot the dendrogram the categories need to be ordered
    305 # according to the dendrogram ordering.
--> 306 self._reorder_categories_after_dendrogram(dendrogram_key)
    308 dendro_ticks = np.arange(len(self.categories)) + 0.5
    310 self.group_extra_size = size

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/plotting/_baseplot_class.py:897, in BasePlot._reorder_categories_after_dendrogram(self, dendrogram)
    894         _categories = _categories[:3] + ["etc."]
    895     return ", ".join(_categories)
--> 897 key = _get_dendrogram_key(self.adata, dendrogram, self.groupby)
    899 dendro_info = self.adata.uns[key]
    900 if self.groupby != dendro_info["groupby"]:

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/plotting/_anndata.py:2384, in _get_dendrogram_key(adata, dendrogram_key, groupby)
   2377     from ..tools._dendrogram import dendrogram
   2379     logg.warning(
   2380         f"dendrogram data not found (using key={dendrogram_key}). "
   2381         "Running `sc.tl.dendrogram` with default parameters. For fine "
   2382         "tuning it is recommended to run `sc.tl.dendrogram` independently."
   2383     )
-> 2384     dendrogram(adata, groupby, key_added=dendrogram_key)
   2386 if "dendrogram_info" not in adata.uns[dendrogram_key]:
   2387     raise ValueError(
   2388         f"The given dendrogram key ({dendrogram_key!r}) does not contain "
   2389         "valid dendrogram information."
   2390     )

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/legacy_api_wrap/__init__.py:80, in legacy_api.<locals>.wrapper.<locals>.fn_compatible(*args_all, **kw)
     77 @wraps(fn)
     78 def fn_compatible(*args_all: P.args, **kw: P.kwargs) -> R:
     79     if len(args_all) <= n_positional:
---> 80         return fn(*args_all, **kw)
     82     args_pos: P.args
     83     args_pos, args_rest = args_all[:n_positional], args_all[n_positional:]

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/tools/_dendrogram.py:121, in dendrogram(adata, groupby, n_pcs, use_rep, var_names, use_raw, cor_method, linkage_method, optimal_ordering, key_added, inplace)
     25 @old_positionals(
     26     "n_pcs",
     27     "use_rep",
   (...)
     49     inplace: bool = True,
     50 ) -> dict[str, Any] | None:
     51     """\
     52     Computes a hierarchical clustering for the given `groupby` categories.
     53 
   (...)
    118     >>> sc.pl.dotplot(adata, markers, groupby='bulk_labels', dendrogram=True)
    119     """
--> 121     raise_not_implemented_error_if_backed_type(adata.X, "dendrogram")
    122     if isinstance(groupby, str):
    123         # if not a list, turn into a list
    124         groupby = [groupby]

File ~/.local/share/hatch/env/virtual/scverse-tutorials/_YRPCeuX/basic-scrna/lib/python3.12/site-packages/scanpy/_utils/__init__.py:1100, in raise_not_implemented_error_if_backed_type(X, method_name)
   1098 def raise_not_implemented_error_if_backed_type(X: object, method_name: str) -> None:
   1099     if is_backed_type(X):
-> 1100         raise NotImplementedError(
   1101             f"{method_name} is not implemented for matrices of type {type(X)}"
   1102         )

NotImplementedError: dendrogram is not implemented for matrices of type <class 'anndata._core.sparse_dataset.CSRDataset'>

Versions

``` ----- anndata 0.10.8 scanpy 1.10.2 ----- PIL 10.3.0 asttokens NA comm 0.2.2 cycler 0.12.1 cython_runtime NA dateutil 2.9.0.post0 debugpy 1.8.1 decorator 5.1.1 executing 2.0.1 h5py 3.11.0 igraph 0.11.4 ipykernel 6.29.4 jedi 0.19.1 joblib 1.4.0 kiwisolver 1.4.5 legacy_api_wrap NA leidenalg 0.10.2 llvmlite 0.42.0 matplotlib 3.8.4 mpl_toolkits NA natsort 8.4.0 numba 0.59.1 numpy 1.26.4 packaging 24.0 pandas 2.2.2 parso 0.8.4 platformdirs 4.2.1 prompt_toolkit 3.0.43 psutil 5.9.8 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.2 pytz 2024.1 scipy 1.13.0 session_info 1.0.0 six 1.16.0 sklearn 1.4.2 stack_data 0.6.3 texttable 1.7.0 threadpoolctl 3.5.0 tornado 6.4 traitlets 5.14.3 vscode NA wcwidth 0.2.13 zmq 26.0.3 ----- IPython 8.24.0 jupyter_client 8.6.1 jupyter_core 5.7.2 ----- Python 3.12.4 (main, Jun 7 2024, 06:33:07) [GCC 14.1.1 20240522] Linux-6.10.3-zen1-1-zen-x86_64-with-glibc2.40 ----- Session information updated at 2024-08-06 12:18 ```
flying-sheep commented 1 month ago

I made a sketch for an implementation in https://github.com/scverse/scanpy/pull/3204, but aggregate only works with in-memory data. Might be a good idea to fix that!