Classification vs. clustering

open-connectome-classes / StatConn-Spring-2015-Info

introductory material

18 stars 4 forks source link

Classification vs. clustering #191

Open dlee138 opened 9 years ago

dlee138 commented 9 years ago

How is a classification problem, as discussed in the data mining paper, different from a clustering problem?

mrjiaruiwang commented 9 years ago

A classification is trying to decide based on what we can observe and know a certain label to attach to that observation, based on some prior set of known label-observation sets. Clustering is trying to figure out what the clusters are of a graph. Both are similar in that they are trying to estimate something unknown, but I don't know if the similarities go much further than that.

whock commented 9 years ago

As I understand it, the difference hinges on what information you have about the data before running the algorithm. In clustering, there are no labels and so the problem is one of unsupervised learning. In classification, by contrast, there are labels for the data points and so the problem is one of supervised learning.

Here's an MIT OCW ppt that explores this issue a bit:

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomes-networks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdf

ajulian3 commented 9 years ago

Classification relates to binary- categorical data as opposed to clustering which is done with quantitative measures. Have you seen instances where both classification and clustering can be used in a data set?

mblohr commented 9 years ago

In SBMs, the latent variable Y can be viewed as a "classification" label of a particular "cluster" in the SBM dataset.

adjordan commented 9 years ago

Classification puts labels on data, clustering puts data into groups that do not necessarily have a label.