sightmachine / SimpleCV

The Open Source Framework for Machine Vision
http://simplecv.org
BSD 3-Clause "New" or "Revised" License
2.69k stars 798 forks source link

Blob Hierarchical / K-Means Clustering #243

Closed kscottz closed 11 years ago

kscottz commented 11 years ago

Create a featureset function for blobs that allows us to cluster blobs based on their position, color, shape, or a feature extractor. The method should allow for both hierarchical clustering or k-means. The results should be returned as a list of featuresets.

vijaym123 commented 11 years ago

I am familiar with K-means algorithm and thought of implementing it using scikit-learn package.

Check if i am correct in the following points :

Can you please explain this stuff a bit more? Thank you

kscottz commented 11 years ago

So hierarchal cluster is just like k-means but the algorithm finds k automagically. For color use avgColor, for shape use seven or so Hu Moments, for position just use x,y. Basically avgColor, Hu Moments, and position are all feature vectors so you may want to have an abstraction for what metric you use for clustering. So for example lets say you pull out a the following for positions:

0,0 10,10

10, 100 0, 100

100, 10 100, 0

100, 100 90, 90

What I would like to see is the blob featureset broken into four smaller feature sets, grouped by the position. Does this help clarify?

vijaym123 commented 11 years ago

Yes, i got it. I will solve it for Kmeans and later try to do Hierarchical clustering.

xamox commented 11 years ago

Do you think the stuff I used for keypoint clustering can be used for this?

kscottz commented 11 years ago

I peaked at this stuff the other day. It is pretty good! I still want to do a bit of refactoring (mainly to use the ROI class and do some outlier pruning). The hierarchical clustering is good, but I would like to see a bit more flexibility in how the dendrogram gets cut up. To that end we should allow the user to select the distance metric programatically.