Occasional dramatic differences between tSNE and UMAP

scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.

https://scanpy.readthedocs.io

BSD 3-Clause "New" or "Revised" License

1.92k stars 599 forks source link

Occasional dramatic differences between tSNE and UMAP #319

Closed jorvis closed 6 years ago

jorvis commented 6 years ago

On a test dataset I compute neighbors and then immediately compute/plot both tSNE and UMAP and show them next to each other. Sometimes, we get pretty dramatic differences such as the one attached. Is this an algorithmic difference or something wrong with my approach?

sc.pp.neighbors(adata, n_pcs=n_pcs, n_neighbors=n_neighbors)
sc.tl.tsne(adata, n_pcs=n_pcs, random_state=random_state)
sc.tl.umap(adata)

sc.pl.tsne(adata, color=genes_to_color, color_map='RdBu_r', use_raw=False, save=".png")
sc.pl.umap(adata, color=genes_to_color, color_map='RdBu_r', use_raw=False, save=".png")

screenshot from 2018-10-22 11-57-49

chlee-tabin commented 6 years ago

This is normal, means that the far away clusters are "globally" more different from the cells that are closer together. UMAP is one way of preserving the global distance, whereas tSNE is pretty much ignorant of the global distance (so one should not consider global distance to make inferences from tSNE plot). I frequently see the UMAP when some very different contaminating cell types are in the sample.

falexwolf commented 6 years ago

UMAP also has no meaning attached when clusters are completely disconnected (Supplemental Figure 10 of this, soon updated on here on bioRxiv and finally in a journal...); and I'd tend to think that this is such a case. Then, UMAP's parameters have to be adjusted (mostly min_disd and spread).

It's true that UMAP has less tendency to tear apart connected things than tSNE. Overall, it's more faithful to the global topology.

chlee-tabin commented 6 years ago

@falexwolf Just out of curiosity, have you compared your method with PHATE? (https://www.biorxiv.org/content/early/2017/03/24/120378 ). I have yet to try out PAGA but have found PHATE working fairly well of showing the trajectory inference. (I am just a biologist, so don't know the specifics of comparing methodologies)

jorvis commented 6 years ago

Thank you all for your feedback here - that was helpful. I'll close this so it doesn't look like an issue needs to be handled, but please, do continue any discussion.

falexwolf commented 6 years ago

@chlee-tabin Which method? PAGA? PAGA is for coarse-graining the data whereas PHATE is for embeddings, right?