scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

UMAP 0.4.0rc1 doesn't play nice with ingest #1036

Closed ivirshup closed 4 years ago

ivirshup commented 4 years ago

@Koncopd, I just tried out the new release candidate for umap and get errors though out the ingest tests. It looks like umap now relies on pynndescent and some functions are no longer available. Here's an example traceback:

------------------------------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------------------------------
running ingest
______________________________________________________________________________________________________________ test_ingest_map_embedding_umap ______________________________________________________________________________________________________________

    def test_ingest_map_embedding_umap():
        adata_ref = sc.AnnData(X)
        adata_new = sc.AnnData(T)

        sc.pp.neighbors(
            adata_ref, method='umap', use_rep='X', n_neighbors=4, random_state=0
        )
        sc.tl.umap(adata_ref, random_state=0)

>       ing = sc.tl.Ingest(adata_ref)

scanpy/tests/test_ingest.py:132: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
scanpy/tools/_ingest.py:270: in __init__
    self._init_neighbors(adata)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <scanpy.tools._ingest.Ingest object at 0x140357550>, adata = AnnData object with n_obs × n_vars = 6 × 5 
    uns: 'neighbors', 'umap'
    obsm: 'X_umap'

    def _init_neighbors(self, adata):
        from umap.distances import named_distances
>       from umap.nndescent import (
            make_initialisations,
            make_initialized_nnd_search,
        )
E       ImportError: cannot import name 'make_initialisations' from 'umap.nndescent' (/usr/local/lib/python3.7/site-packages/umap/nndescent.py)

scanpy/tools/_ingest.py:210: ImportError
Koncopd commented 4 years ago

@ivirshup , thank you. I can fix this by moving this part from umap directly to pynndescent. This will add pynndescent to requirments but it is indirectly required by the new umap anyway it seems.

ivirshup commented 4 years ago

@Koncopd Currently breaking test for me:

$ pytest -k test_ingest
===================================================== test session starts =====================================================
platform darwin -- Python 3.7.6, pytest-5.3.5, py-1.8.0, pluggy-0.12.0
rootdir: /Users/isaac/github/scanpy, inifile: pytest.ini, testpaths: scanpy/tests/
plugins: pylama-7.7.1, parallel-0.0.10, cov-2.7.1, black-0.3.7, hypothesis-5.6.0
collected 393 items / 389 deselected / 4 skipped                                                                              

scanpy/tests/test_ingest.py ...F                                                                                        [100%]

========================================================== FAILURES ===========================================================
_______________________________________________ test_ingest_map_embedding_umap ________________________________________________

    def test_ingest_map_embedding_umap():
        adata_ref = sc.AnnData(X)
        adata_new = sc.AnnData(T)

        sc.pp.neighbors(
            adata_ref, method='umap', use_rep='X', n_neighbors=4, random_state=0
        )
        sc.tl.umap(adata_ref, random_state=0)

        ing = sc.tl.Ingest(adata_ref)
        ing.fit(adata_new)
        ing.map_embedding(method='umap')

        reducer = UMAP(min_dist=0.5, random_state=0, n_neighbors=4)
        reducer.fit(X)
        umap_transformed_t = reducer.transform(T)

>       assert np.allclose(ing._obsm['X_umap'], umap_transformed_t)
E       assert False
E        +  where False = <function allclose at 0x119616b00>(array([[16.566338, 20.174282],\n       [15.368203, 20.291983]], dtype=float32), array([[16.502459, 20.157679],\n       [15.581459, 20.302881]], dtype=float32))
E        +    where <function allclose at 0x119616b00> = np.allclose

scanpy/tests/test_ingest.py:140: AssertionError
---------------------------------------------------- Captured stderr call -----------------------------------------------------
computing neighbors
    finished: added to `.uns['neighbors']`
    'distances', distances for each pair of neighbors
    'connectivities', weighted adjacency matrix (0:00:00)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:00)

With these versions:

>>> sc.logging.print_versions()                                                                                            
scanpy==1.4.5.2.dev37+g51dc038 anndata==0.7.2.dev13+g4440b90.d20200316 umap==0.4.0rc1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1
Koncopd commented 4 years ago

@ivirshup ok, thanks, i'll check.

nahanoo commented 4 years ago

I stumbled across the same error with scanpy==1.4.5.1 anndata==0.7.1 umap==0.4.2

Did not quite understand the solution for this issue. What should I do?

Best wishes

Koncopd commented 4 years ago

@nahanoo Hi, there are 3 options for now:

  1. downgrading umap to 0.39
  2. installing scanpy from github
  3. waiting for a new release of scanpy.
nahanoo commented 4 years ago

Thank you for the fast response. Building from source did the job for me.