Open dlee138 opened 9 years ago
I'm not sure I'm reading this the right way, but if you predetermine the number of clusters, doesn't that just reduce the computations of the clustering algorithm by the number of times they try to find the right cluster number to 1?
You could try doing a cross-validation scheme (like k-fold CV) with different number of clusters. It might work out that there will be a bias-variance tradeoff though and you just have to make a judgement, or find the minimum bias given some threshold for variance
By predetermining the number of clusters, we give the algorithm a bit more to work with, because it knows how many groups you want the nodes clustered into. However, as @mrjiaruiwang stated, this could introduce some sort of bias or variance (depending on if you use fewer or more clusters than the network actually has).
Issue #104 (Problem of Overfitting Data) might be useful.
How would predetermining the number of clusters affect the effectiveness of different clustering algorithms?