scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.82k stars 587 forks source link

scanpy.tl.umap after bbknn #1989

Open guensen0 opened 2 years ago

guensen0 commented 2 years ago

Hi, I got an error when running tl.umap after bbknn normalisation... new in version 1.7.2

Minimal code sample (that we can copy&paste without having any data)

adata_bbknn = bbknn.bbknn(adata, batch_key = metacol, n_pcs = number_of_pcs_for_reduction,copy=True)
scanpy.tl.umap(adata_bbknn, min_dist=0.2, spread=2, n_components=3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-73-a5a2e6833485> in <module>()
      1 adata_bbknn = bbknn.bbknn(adata, batch_key = metacol, n_pcs = number_of_pcs_for_reduction,copy=True)
----> 2 scanpy.tl.umap(adata_bbknn, min_dist=0.2, spread=2, n_components=3)

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method, neighbors_key)
    205             neigh_params.get('metric', 'euclidean'),
    206             neigh_params.get('metric_kwds', {}),
--> 207             verbose=settings.verbosity > 3,
    208         )
    209     elif method == 'rapids':

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, output_metric, output_metric_kwds, euclidean_output, parallel, verbose)
   1037             random_state,
   1038             metric=metric,
-> 1039             metric_kwds=metric_kwds,
   1040         )
   1041         expansion = 10.0 / np.abs(initialisation).max()

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in spectral_layout(data, graph, dim, random_state, metric, metric_kwds)
    304             random_state,
    305             metric=metric,
--> 306             metric_kwds=metric_kwds,
    307         )
    308 

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in multi_component_layout(data, graph, n_components, component_labels, dim, random_state, metric, metric_kwds)
    191             random_state,
    192             metric=metric,
--> 193             metric_kwds=metric_kwds,
    194         )
    195     else:

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in component_layout(data, n_components, component_labels, dim, random_state, metric, metric_kwds)
    120             else:
    121                 distance_matrix = pairwise_distances(
--> 122                     component_centroids, metric=metric, **metric_kwds
    123                 )
    124 

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, force_all_finite, **kwds)
   1738         raise ValueError("Unknown metric %s. "
   1739                          "Valid metrics are %s, or 'precomputed', or a "
-> 1740                          "callable" % (metric, _VALID_METRICS))
   1741 
   1742     if metric == "precomputed":

ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or 'precomputed', or a callable

Versions

----- anndata 0.7.5 scanpy 1.7.2 sinfo 0.3.1 ----- PIL 8.0.1 anndata 0.7.5 annoy NA bbknn NA cached_property 1.5.1 cairo 1.20.0 cffi 1.14.4 colorama 0.4.4 cycler 0.10.0 cython_runtime NA dateutil 2.8.1 decorator 4.4.2 get_version 2.1 h5py 3.1.0 igraph 0.8.3 ipykernel 5.3.4 ipython_genutils 0.2.0 joblib 0.17.0 kiwisolver 1.3.1 legacy_api_wrap 0.0.0 leidenalg 0.8.3 llvmlite 0.34.0 louvain 0.6.1 matplotlib 3.3.3 mpl_toolkits NA natsort 7.1.0 numba 0.51.2 numexpr 2.7.1 numpy 1.19.4 packaging 20.4 pandas 1.1.4 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prompt_toolkit 1.0.15 psutil 5.8.0 ptyprocess 0.6.0 pycparser 2.20 pygments 2.7.2 pyparsing 2.4.7 pytz 2020.4 scanpy 1.7.2 scipy 1.5.3 seaborn 0.11.0 setuptools_scm NA simplegeneric NA sinfo 0.3.1 six 1.15.0 sklearn 0.23.2 sphinxcontrib NA statsmodels 0.12.1 storemagic NA tables 3.6.1 texttable 1.6.3 tornado 6.1 traitlets 5.0.5 typing_extensions NA umap 0.4.6 wcwidth 0.2.5 zipp NA zmq 20.0.0 ----- IPython 5.8.0 jupyter_client 6.1.7 jupyter_core 4.7.0 ----- Python 3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58) [GCC 9.3.0] Linux-4.9.0-16-amd64-x86_64-with-debian-9.13 8 logical CPU cores ----- Session information updated at 2021-09-01 08:49
katieaney commented 2 years ago

@guensen0 Did you ever solve this issue?

iaaka commented 1 year ago

Just run into the same problem and found solution. I'll post it here in case anyone will need it. Briefly: adata.uns['neighbors']['params']['metric'] = 'cosine' will do the trick (or choose any other valid metric)

Not completely sure, but seems it happens when neighbour graph consists of more than one components. In this case umap needs to estimate the distance between them. It takes metric name from adata.uns['neighbors']['params']['metric'] but angular is not supported in umap that cause the problem. The strange thing that the example given by @guensen0 uses defaults that at least now is euclidean. Maybe it was different in the past. But at least in my case the above-mentioned trick solved the problem.

Other options are 1) make sure that neighbour graph if fully linked (increase number of neighbors) b) use metrics that are supported by both bbknn and umap (almost all except angular)