Closed cornhundred closed 3 years ago
We can also reproduce this issue using the Scanpy pbmc3k tutorial example
That is definitely disconcerting. I'll try to look into what the issue may be. It looks rather like you are just getting the spectral initialization instead of the UMAP embedding out.
So first of all basic UMAP seems to be working and doesn't produce results like this (as you note that the basic usage tutorial seems to work). That's a good start, as at least the package itself isn't broken. That means it is most likely in the interaction of scanpy and UMAP.
My best guess, on first glance, is that some parameter options may have shuffled around. Ideally everything should be keyword only (I haven't followed scikit-learn in enforcing that and making that standard yet, but this is a good reason why I should), but if it is positionally called in scanpy that might be the problem.
Alright, I think I see the problem. Included in 0.5.2 is commit e442bcd9323fd218fc4a3a6287baa1067512dfe1 which allows n_epochs
to be zero to get the initial embedding out, which several people wanted. Unfortunately internally to scanpy they set n_epochs = 0
which used to be a way to get an automatically set value. That now needs to be n_epochs=None
. You can work around this right now by setting maxiter
in the scanpy call. A value of 200 is probably good.
Edit: I should note that this is in calls to an internal umap function, and not the public API, which remained the same
Thanks @lmcinnes for the quick response. Would you all want to roll back that change since it effectively changed the API, but the version name would be interpreted as only a bug fix. Otherwise Scanpy will have to roll out some sort of update.
It was an update to a function that isn't part of the public facing API, so I was not anticipating issues. I've submitted a change to scanpy that should resolve the issue. I would be happy to discuss options, but would rather not make a roll-back release if I don't have to.
Ok that makes sense. Thanks again for the quick response
Closing since this is something that Scanpy will resolve.
Hi, we are seeing unexpected UMAP embeddings using the 0.5.2 umap-learn version, run via Scanpy, with our single cell gene expression data (publicly available MERFISH data from Vizgen).
Our original embedding using version 0.5.1 looks like
and the embedding with 0.5.2 looks like
Zooming into the 0.5.2 embedding reveals that cells appear to be embedded into a lattice like structure
We're wondering if this is being caused in part by some sort of a rounding error in the embedding.
We have included Colab notebooks demonstrating the normal behavior using version 0.5.1 and the new unexpected behavior using version 0.5.2. Please let us know if you have any issues running the notebooks - they require authentication via Google to load the publicly available data and there are static and interactive versions of the UMAP embeddings.
Colab UMAP-Learn_0.5.1.ipynb
Colab UMAP-Learn_0.5.2.ipynb
The only differences between the notebooks are where we use pip to install a specific version of umap-learn or use Scanpy's version.
We also tested using the basic usage examples from the documentation and these examples appear to be working with the new 0.5.2 version - see colab notebook Basic_Usage_Test-UMAP_0.5.2.ipynb