Blob Hierarchical / K-Means Clustering

sightmachine / SimpleCV

The Open Source Framework for Machine Vision

http://simplecv.org

BSD 3-Clause "New" or "Revised" License

2.69k stars 799 forks source link

Blob Hierarchical / K-Means Clustering #243

Closed kscottz closed 11 years ago

kscottz commented 12 years ago

Create a featureset function for blobs that allows us to cluster blobs based on their position, color, shape, or a feature extractor. The method should allow for both hierarchical clustering or k-means. The results should be returned as a list of featuresets.

vijaym123 commented 11 years ago

I am familiar with K-means algorithm and thought of implementing it using scikit-learn package.

kmeansClustering(k, properties)
Here 'k' is the number of clusters he want to do.
'properties' parameter accepts a list of strings, For example, if the person wants to cluster based on just color and shape. He need to pass properties = [ "color", "shape" ],
the function returns a list of featureset after clustering.

Check if i am correct in the following points :

For the "color" internally it should use mAvgColor to cluster.
"shape" contour points are used right? But mContour gives varied number of points for each blob. How do to account this, when clustering?
position is nothing but x,y

Can you please explain this stuff a bit more? Thank you

kscottz commented 11 years ago

So hierarchal cluster is just like k-means but the algorithm finds k automagically. For color use avgColor, for shape use seven or so Hu Moments, for position just use x,y. Basically avgColor, Hu Moments, and position are all feature vectors so you may want to have an abstraction for what metric you use for clustering. So for example lets say you pull out a the following for positions:

0,0 10,10

10, 100 0, 100

100, 10 100, 0

100, 100 90, 90

What I would like to see is the blob featureset broken into four smaller feature sets, grouped by the position. Does this help clarify?

vijaym123 commented 11 years ago

Yes, i got it. I will solve it for Kmeans and later try to do Hierarchical clustering.

xamox commented 11 years ago

Do you think the stuff I used for keypoint clustering can be used for this?

kscottz commented 11 years ago

I peaked at this stuff the other day. It is pretty good! I still want to do a bit of refactoring (mainly to use the ROI class and do some outlier pruning). The hierarchical clustering is good, but I would like to see a bit more flexibility in how the dendrogram gets cut up. To that end we should allow the user to select the distance metric programatically.