shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
208 stars 53 forks source link

Clustering results in one cluster with 99.99% of the data #29

Closed ilyak93 closed 2 years ago

ilyak93 commented 3 years ago

Hi. I tried a lot of different hyper-parameter tuning and also all the data processioning according to the previous closed issues, but didn't manage to handle this issue. The results varied from a big number of clusters with always one dominant cluster with almost all of the data, and all other clusters with singletons or just a few examples of data. With some hyper-parameters I've got a lot of almost empty clusters except the dominant and with other just a few, so only the number of clusters changed, but always remained one dominant cluster along the almost empty. I tried also preproccesing which not included initially in the code as standard scaler. Tried also all of the mentioned in the code preproccesing methods and did it in the "make_data" step. My data is a temporal data, i tried both architectures, with a little tweak to the convolutional: made it 1d. Attaching here the data heat maps before and after the normalizing: image

image

I suspect that those architectures not useful for this data, otherwise I don't have explanation except that the data isn't separable, but that strange because even using simple Dimension Reductions techniques as PCA and plotting it with tSNE shows that there is some clusters. Really hard issue according to the fact I tried everything except totally new architectures,