scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.79k stars 500 forks source link

Extract tree structure from clustering #55

Closed tkosciol closed 8 years ago

tkosciol commented 8 years ago

I'm wondering if it is possible to get a tree-like structure of the resulting clustering. What I'm ultimately trying to do is to get a dendogram overlaid on top of a heat map (e.g. seaborn.clustermap)

From looking at the source code, it seems like there's a condensed tree attribute and that there's a _raw_tree attribute, is that what I should be looking at?

cc @ElDeveloper

lmcinnes commented 8 years ago

There are a couple of tree structures. The first is the condensed tree. This has the clusters, with points falling out of them. You might have to do a little work to fit that with a heatmap, but it is doable. I would suggest you use the to_pandas method to get a dataframe and understand exactly what that data structure contains if you want to use it.

The second option is the single_linkage_tree which has a to_numpy method that will give you exactly the same dendrogram that scipy's single linkage clustering does (and is what seaborn uses internally in its clustermap). That might be easier to use immediately.

I reccommend the condensed tree, but as I say you might have to work through the details a bit to work out the best visualisation to go with a heatmap.

On Wed, Aug 17, 2016 at 7:50 PM, Tomasz notifications@github.com wrote:

I'm wondering if it is possible to get a tree-like structure of the resulting clustering. What I'm ultimately trying to do is to get a dendogram overlaid on top of a heat map (e.g. seaborn.clustermap)

From looking at the source code, it seems like there's a condensed tree attribute and that there's a _raw_tree attribute, is that what I should be looking at?

cc @ElDeveloper https://github.com/ElDeveloper

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/hdbscan/issues/55, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBWrMmdPp6P2NtKK9uMj313gXkSRDks5qg545gaJpZM4JnAC2 .

ElDeveloper commented 8 years ago

Great, thanks so much for the explanation, this makes a lot of sense! We were planning on using seaborn, so maybe using the single_linkage_tree will be the easiest.

lmcinnes commented 8 years ago

I should work out how to do the condensed tree properly and add it to seaborn with a pull request. In the meantime the single_linkage_tree should indeed do the job.

On Thu, Aug 18, 2016 at 2:08 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

Great, thanks so much for the explanation, this makes a lot of sense! We were planning on using seaborn, so maybe using the single_linkage_tree will be the easiest.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/hdbscan/issues/55#issuecomment-240807363, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBT7sRgCme_6UtjB-ty2ktJVbDGh7ks5qhJ-vgaJpZM4JnAC2 .

ElDeveloper commented 8 years ago

Great, thanks for the info. I would be interested in hearing more about that integration.

On (Aug-18-16|12:37), Leland McInnes wrote:

I should work out how to do the condensed tree properly and add it to seaborn with a pull request. In the meantime the single_linkage_tree should indeed do the job.

On Thu, Aug 18, 2016 at 2:08 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

Great, thanks so much for the explanation, this makes a lot of sense! We were planning on using seaborn, so maybe using the single_linkage_tree will be the easiest.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/hdbscan/issues/55#issuecomment-240807363, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBT7sRgCme_6UtjB-ty2ktJVbDGh7ks5qhJ-vgaJpZM4JnAC2 .

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/lmcinnes/hdbscan/issues/55#issuecomment-240832707