V.01 release - Githubissues

sjwhitworth commented 10 years ago

Hi everyone. I'd like to formalise what features we want for a V.01 release. What I mean by this is, is the first version of GoLearn that is nearly ready for production use externally. We'll learn much more when it's in the hands of users. Docs need to be improved substantially, and we need a few more implementations of algorithms.

What does everyone think?

cc: @ifesdjeen @npbool @macmania @lazywei @marcoseravalli

sjwhitworth commented 10 years ago

@e-dard @Sentimentron

Sentimentron commented 10 years ago

At a minimum, I'd expect

ID3/C4.5 Decision tees
Naive Bayes
Discretisation
Random Forests

Discretisation and random forests I've got working, working on ID3 now.

lazywei commented 10 years ago

I think basic linear models are required: logistic regression, linear regression. SVM integration would be great (w/ libsvm). Cross validation is also essential.

@Sentimentron I'm not sure in what case we will need to use discretization?

Sentimentron commented 10 years ago

ID3, as an example, only works on categorical attributes (C4.5 relaxes this restriction but it's more complex to implement). Similarly, you have to use Gaussian Naive Bayes if you want to handle continuous attributes (it's underlying assumption - that continuous attributes are normally distributed - is not always true).

e-dard commented 10 years ago

I think, rather than focussing on features the library needs to reach a specific bar, it's healthier to merely order the features we want in an order to tackle them.

I have an old naive Bayes implementation in Python I could port over as a first step. Could also look at implementing GNB if people think it's important after that.

One class of algorithms that are missing, which I have a few Go implementations of, are Multi-armed Bandits. A very useful reinforcement learning technique. Would be happy to port these into the library.

sjwhitworth commented 10 years ago

@Sentimentron: Random forests would be great. I think that someone had already started to implement Naive Bayes..

@e-dard: Agreed. I just think it's useful to have some idea of 'minimal stable functionality' before we start promoting it more widely.

e-dard commented 10 years ago

Is someone working on naive bayes? I didn't see anything explicit in the issues list? Was working on a port of my Python implementation.

sjwhitworth commented 10 years ago

This is what I've seen so far, but it seems pretty nascent. https://github.com/tncardoso/golearn/tree/feature/naive/naive

Maybe it would be good to sync alongside him.

sjwhitworth commented 10 years ago

Any more thoughts? I think:

Random Forests
Naive Bayes
Stochastic/batch gradient descent
Cross Validation
Linear Regression
Logistic Regression

would be a great first start.

Sentimentron commented 10 years ago

So we now have:

Random Forests
Some support for SGD
Some support for cross validation
@tncardoso's Naive Bayes is looking good

Just leaving

Linear regression
Logistic regression

Is the end of June a good target?

sjwhitworth commented 10 years ago

That sounds good to me. Logistic regression should be ready to merge after @npbool makes some changes. That only leaves linear regression.

Sentimentron commented 10 years ago

I think we've actually merged everything in that list.

sjwhitworth commented 10 years ago

Reckon we're ready to go for a first proper release? Brilliant work @Sentimentron + all.

Sentimentron commented 10 years ago

Are we going to tag before or after #62?

sjwhitworth commented 10 years ago

I don't mind. It all looked good to me.

sjwhitworth / golearn

V.01 release #28