shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
208 stars 53 forks source link

Interpreting output as fuzzy clustering? #6

Closed LemonPi closed 5 years ago

LemonPi commented 5 years ago

Hi I'm interested in using this algorithm as an intermediate step in a pytorch pipeline. Therefore I need the output of clustering to be differentiable with respect to the input. The actual output of this algo is the position of the representatives, which then gets converted (non differentiably) to cluster assignments via connected components for those with distance below a threshold.

Do you see a way to relax the clustering assignment such that I can differentiate through it? Ultimately the output has to be numbers rather than indices/labels, so I'm thinking a probability of being in a cluster? (but this contradicts the fact that we can't specify the number of clusters)

shahsohil commented 5 years ago

Hi LemonPi,

The actual output of DCC is cluster representatives 'U'. These are completely differentiable. The whole purpose of this algorithm was differentiability and hence DCC do not assign cluster until the learning is completed.

You can plug in DCC network and use cluster representative as input to the next step in your algorithm. Again, the advantage here is you will not have to specify no. of clusters.

Let me know if this clears your doubt.

LemonPi commented 5 years ago

Hey Sohil, thanks for responding!

What I meant was that I know the representatives U are differentiable, but ideally what my algorithm does after clustering is to apply least squares inside each cluster. There's probably no differentiable way to get from the representatives to having cluster assignments (unless we try to fit a GMM to the representatives instead of the data points so that each point has a probability of belonging to a certain cluster - might be interesting but even if it works we lose the flexibility of not specifying the number of clusters?).

I was asking if you had any idea of how to approximate the effects of explicit cluster labels for the 'least squares inside each cluster' part of the algorithm just using the data point representatives.