open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Number of clusters in cluster evaluation #177

Open dlee138 opened 9 years ago

dlee138 commented 9 years ago

How would predetermining the number of clusters affect the effectiveness of different clustering algorithms?

ghost commented 9 years ago

I'm not sure I'm reading this the right way, but if you predetermine the number of clusters, doesn't that just reduce the computations of the clustering algorithm by the number of times they try to find the right cluster number to 1?

mrjiaruiwang commented 9 years ago

You could try doing a cross-validation scheme (like k-fold CV) with different number of clusters. It might work out that there will be a bias-variance tradeoff though and you just have to make a judgement, or find the minimum bias given some threshold for variance

DSP137 commented 9 years ago

By predetermining the number of clusters, we give the algorithm a bit more to work with, because it knows how many groups you want the nodes clustered into. However, as @mrjiaruiwang stated, this could introduce some sort of bias or variance (depending on if you use fewer or more clusters than the network actually has).

mblohr commented 9 years ago

Issue #104 (Problem of Overfitting Data) might be useful.