Open dlee138 opened 9 years ago
A classification is trying to decide based on what we can observe and know a certain label to attach to that observation, based on some prior set of known label-observation sets. Clustering is trying to figure out what the clusters are of a graph. Both are similar in that they are trying to estimate something unknown, but I don't know if the similarities go much further than that.
As I understand it, the difference hinges on what information you have about the data before running the algorithm. In clustering, there are no labels and so the problem is one of unsupervised learning. In classification, by contrast, there are labels for the data points and so the problem is one of supervised learning.
Here's an MIT OCW ppt that explores this issue a bit:
Classification relates to binary- categorical data as opposed to clustering which is done with quantitative measures. Have you seen instances where both classification and clustering can be used in a data set?
In SBMs, the latent variable Y can be viewed as a "classification" label of a particular "cluster" in the SBM dataset.
Classification puts labels on data, clustering puts data into groups that do not necessarily have a label.
How is a classification problem, as discussed in the data mining paper, different from a clustering problem?