Closed Celine-Serry closed 6 months ago
Hi @Celine-075,
constructing the neighborhood graph is not guaranteed to be reproducible
(see also https://github.com/scverse/scanpy/issues/2014)
If all of these things are constant between your two versions, then it's a bug.
Also: are you sure the clustering has actually changed (by comparing the cell barcodes)? Or is it just the UMAP that looks differently, but the clusters are the same?
Ah I see. I did produce the results on the same machine with the same package version and number of CPUs.
The clustering seems to be hanged which becomes visible from these plots: This is the unclustered map, where you can see that the bottom group in cluster 1 (orange) actually is pulled toward the group in cluster 2 (green) when I clustered them initially:
So here you see that part of cluster 1 is actually added to cluster 2 (which also make sense when looking at the expression profiles of those groups).
@Celine-075, I'm about this statement:
When i previously performed leiden clustering on my data, the shape of the UMAP changed, as expected.
This is not expected unless you recompute UMAP. What do you mean by clustered UMAP?
So here you see that part of cluster 1 is actually added to cluster 2 (which also make sense when looking at the expression profiles of those groups).
I'm not sure I can see that, since it's not obvious which point in the first plot corresponds to a point in the other plot. I think a confusion matrix (or using the same UMAP layout) would be a more appropriate way to compare the clusterings here.
Hmm okay, I thought leiden clustering pulls cells with similar expression closer to each other on the UMAP space? By clustered UMAP, i mean the UMAP produced after i performed leiden clustering on it. By unclustered i mean that I just plotted the UMAP without calculating the leiden clusters.
Then I dont know what happened, but when I plotted the UMAP without leiden clustering performed, it had a different shape in the UMAP then after I calculated the leiden clusters. I will check the confusion matrix and come back to it when I have the results. In the meantime I can only post this image where I put both UMAPs next to each other and drew what I meant about part of cluster1 being added to cluster2 after performing the leiden clustering:
Hmm okay, I thought leiden clustering pulls cells with similar expression closer to each other on the UMAP space?
No. leiden clustering is just trying to divide the data into a discrete set of clusters. The only output of clustering is a cluster label on each point and some parameters.
I do think people overload the term "cluster", so the confusion here is understandable.
A new UMAP will be generated if you call sc.tl.umap
. E.g.
sc.pp.neighbors(adata, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(adata) # Computes a UMAP layout
sc.tl.leiden(adata, resolution = 0.2) # Computes a clustering
sc.pl.umap(adata, color='leiden')
If you just want to cluster with different parameters, you can call sc.tl.leiden
again. See for example the clustering at multiple resolutions here: https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering.html#manual-cell-type-annotation
I would note that both leiden
and umap
rely on the graph generated by sc.pp.neighbors
.
Ah I see now that recalculating sc.pp.neigbors fixes my problem. Thanks!
Great to hear! I’ll close this then, but if you have more questions or concerns about reproducibility, feel free to comment here or make a new issue!
Please make sure these conditions are met
What happened?
When i previously performed leiden clustering on my data, the shape of the UMAP changed, as expected.
However, when i now try to reproduce my results, I suddenly am only able to get the leiden clustering that follows the distribution of the unclustered umap
Unclustered UMAP
Clustered UMAP:
the dataset with which i produced the clustered UMAP:
the dataset with which i produced the unclustered UMAP:
Ive tried to check whether the data is maybe different or something, but i dont see anything that could be causing these differences, could you please help trying to figure out why the leiden clustering suddenly produces different results?
Minimal code sample
Error output
No response
Versions