sjwhitworth / golearn

Machine Learning for Go
MIT License
9.24k stars 1.19k forks source link

K Means clustering #26

Open sjwhitworth opened 10 years ago

hpxro7 commented 10 years ago

I would be happy to work on this.

lazywei commented 10 years ago

Hi @hpxro7 It would be great if you could fork this repo and open a new branch for this feature. After finishing your work, you can then just send us a pull request! :beer:

Sentimentron commented 10 years ago

Quick question: what is our clustering interface going to look like? I was thinking of introducing a SetAttribute, but this requires support within Instances for types longer than 64 bits, and I don't have time to refactor the code right now.

hpxro7 commented 10 years ago

Hi @lazywei, absolutely. I'll be getting on that now :+1:!

@Sentimentron Could you briefly expand on the purpose of SetAttribute? I'm assuming the clustering algorithms will adhere to the Estimator and Predictor interfaces.

I had some questions myself :). I am perhaps misunderstanding the type but is Instances meant exclusively for data with class labels or should ClassIndex be simply omitted for unsupervised learning?

Sentimentron commented 10 years ago

So I thought there might be a few possibilities for what gets returned from a clustering algorithm.

     cluster (IntAttribute)              members (SetAttribute)
     1                                           1, 2, 3, 4
     2                                           4, 5, 6, 7

WRT ClassIndex the next batch of work I'm planning will allow more than one or none at all as per @lazywei's suggestion. For now, I'd probably check if the ClassIndex is set to -1, and if it isn't, ignoring that attribute.

Edit: Also, I will need to implement an IntAttribute, so I'll see if I can get that done today.

Sentimentron commented 10 years ago

OK, so IntAttribute implemented in #39

hpxro7 commented 10 years ago

Great, thanks a lot for the clarifications.

Until we've got SetAttribute I'll implement Predict of K-Means to return a map from row numbers to clusters.

Sentimentron commented 10 years ago

OK, that sounds good.