kylemcdonald / Coloring-t-SNE

Exploration of methods for coloring t-SNE.
219 stars 19 forks source link

Leiden clustering #2

Open flying-sheep opened 5 years ago

flying-sheep commented 5 years ago

Hi! I think for what you’re doing, you might consider a community detection algorithm on the high dimensional data (or a bunch of PCAs/ICAs to spped things up).

They serve us very well in the computational biology world, much better than the primitive k-means or dbscan.

Until recently, we used louvain community detection, but the author of the python package recently published an improved version: https://github.com/vtraag/leidenalg

kylemcdonald commented 5 years ago

Thanks! This is the best example I could find that compares them: https://jmonlong.github.io/Hippocamplus/2018/02/13/tsne-and-clustering/ (comparison is to Louvain, but I'm assuming it would be similar).

I will try to reproduce this notebook with open data and add this algorithm when I do.

In general I'm more interested in continuous "labels" than clustering because I think our eyes can pick out more detail from the continuous variation, but I'm very curious what happens here.

flying-sheep commented 5 years ago

Yup, the continuous labels things is closer to the truth than clustering when there’s a lot of continuous transitions going on.

However, in that case tSNE isn’t a good choice. You could try UMAP for such data.

kylemcdonald commented 5 years ago

Why do you say UMAP is better for data with continuous variation than t-SNE? I have a lot of experience with both, but haven't seen or read anything to indicate this.

flying-sheep commented 5 years ago

I’m pretty surprised that you didn’t, the preservation of that kind of structure is one of its main selling points. t-SNE rips things apart, UMAP doesn’t. Here’s the first google hit for “umap vs tsne”, which says

[…] notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.