scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 598 forks source link

Change default leiden clustering backend to `igraph`, and reduce default number of iterations #2865

Open ivirshup opened 8 months ago

ivirshup commented 8 months ago

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

On the recommendation of the library authors, we should change the default leiden backend to igraph (https://github.com/scverse/scanpy/issues/1053) now that this has been made available in https://github.com/scverse/scanpy/pull/2815).

At the same time, we should change the number of default iterations. Right now, we iterate until convergence. This can be very slow, especially for large datasets. We should probably just stick with the default of the underlying library (which is 2).

ilan-gold commented 7 months ago

FutureWarning for the next release to begin the process.

aeisenbarth commented 5 months ago

I first understood the warning that the default will be switched so that we have to be explicit (flavor="leidenalg") if the old default works for us. But the warning persists, maybe suggesting that we should switch to flavor="igraph".

When switching to igraph, we get a deprecation warning resolution_parameter keyword argument is deprecated, use resolution=... instead which has been deprecated in igraph some years ago (7848bcb). Scanpy has not yet updated the parameter since the initial leiden implementation, or the deprecation was not noticed. Can users ignore the warning, or should users put an upper cap on the igraph version to be safe?

ilan-gold commented 5 months ago

@aeisenbarth Could you provide a small code sample or point to one of our unit tests where this happens? I am not seeing what you are referring to. For example:

import scanpy as sc
adata = sc.datasets.pbmc3k()
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, flavor="igraph", resolution=5, directed=False, n_iterations=2, copy=True)
sc.tl.leiden(adata, flavor="leidenalg", resolution=5, copy=True)

do not seem to yield the warnings you describe. Thanks!

aeisenbarth commented 5 months ago

DeprecationWarning is silenced by default, for not annoying users. But I get all when running unit tests with pytest.

Here with all warnings enabled:

>>> import warnings
>>> warnings.filterwarnings("always")
>>> import scanpy as sc
>>> adata = sc.datasets.pbmc3k()
>>> sc.pp.pca(adata)
>>> sc.pp.neighbors(adata)
>>> sc.tl.leiden(adata, flavor="igraph", n_iterations=2)
…/lib/python3.9/site-packages/scanpy/tools/_leiden.py:185: DeprecationWarning: resolution_parameter keyword argument is deprecated, use resolution=... instead
  part = g.community_leiden(**clustering_args)
$ pip freeze | grep -E "anndata|scanpy|igraph|python-igraph|leidenalg"
anndata==0.10.7
igraph==0.11.5
leidenalg==0.10.2
python-igraph==0.11.5
scanpy==1.10.1
ilan-gold commented 5 months ago

@aeisenbarth I still cannot reproduce this warning, which is very frustrating. But I also don't understand how this could be happening. Looking at the code:

https://github.com/scverse/scanpy/blob/3ba3f46b4e6e77e8c6f0551db9663822097b486a/scanpy/tools/_leiden.py#L163-L184

I cannot see how resolution_parameter would end up in the community_leiden call. I'll keep looking into this a bit more.

ilan-gold commented 5 months ago

Ah I see @aeisenbarth: https://github.com/scverse/scanpy/pull/2999 solved this, so this will be resolved in an upcoming release: https://github.com/scverse/scanpy/blob/main/docs/release-notes/1.10.2.md