Direction of this project

macmania commented 10 years ago

I was wondering if it would be better to focus on a particular class in machine learning that is computationally intensive - dimensionality reduction, neural networks, fourier transform, etc rather than re-writing the algorithms. My take is if we just use one of the multi-translator compiler - python -> go or c++ -> go, we might be able to save some time. The team can just do some tests.

sjwhitworth commented 10 years ago

I'd rather write them ourselves. That way, we can better optimise to any advantages that Go will give us - e.g. using goroutines to independently train random forests. Plus, I could do with the practice :)

lazywei commented 10 years ago

Then I think we may need to implement some basic matrix computation algorithms first? If we want to re-build whole algorithms, it's definitely we will heavily need some fundamental infrastructures.

sjwhitworth commented 10 years ago

Yes, I agree. There has been a good start on a matrix library that is imported in this library, but it's probable that we will need to do a lot more ourselves. Eigenvectors, etc.

lazywei commented 10 years ago

If we want to do so many things by ourself, we probably should think a better way to organize the packages / subpackages etc. Any thought on this?

sjwhitworth commented 10 years ago

I'm agnostic. I'm willing to take suggestions on it, or just to start writing things, and continually refactor.

macmania commented 10 years ago

Can you work on the documentation part of the project just to give an idea for the people doing the project on how to run and test a particular component. And if you have any kind of standards in terms of what coding standards - formatting, naming conventions, etc. :)

lazywei commented 10 years ago

Agree! We do need some formatting, naming conventions, so that others can contribute more easily. For formatting, I recommend to use http://godoc.org/code.google.com/p/go.tools/cmd/goimports For naming conventions, I have no idea yet. Another point is, how do we organize so many learning algorithms? Every algorithms in their own "golearn/xxx" package?

sjwhitworth commented 10 years ago

@macmania: Sure I can. Like I said, I'm definitely willing to take suggestions from others. I don't want to be the sole arbiter of the project's direction. I will try and write some up tonight.

@lazywei: We should group algorithms together by commonalities. For example, a decision tree package can contain CART, Random Forests, etc. A neural network package could contain simple neural nets, RBM's, etc. So, not grouping by supervised/unsupervised, but by shared approaches to learning methods. I'd imagine that we'd also have a data package for cross validation/label encoding, some additional matrix/distance methods in another package, and some utility functions.

macmania commented 10 years ago

would it be better if we had grouped the algorithms by supervised and unsupervised? We can do that later on if we have tons of packages. @sjwhitworth: sounds good, I'll work on the library after you've done the documentation ^_^

macmania commented 10 years ago

@sjwhitworth: let me know if you need help writing the documentation for the project.

sjwhitworth commented 10 years ago

@macmania: Sure, why not. Let's just get writing some stuff, and we can refactor when we need to :) Definitely would love help writing documentation - I'll just put it all in a Markdown file for now, and then we can pretty it up later.

macmania commented 10 years ago

awesome - do you mind putting a google drive document url so we can edit as we go?

sjwhitworth commented 10 years ago

Sure, here you go -> https://docs.google.com/document/d/1x21Y-g1rga0LTwC_LnKHi0y7RjFzd2Il7YB47rp7kTA/edit?usp=sharing

macmania commented 10 years ago

can you make it editable?

mseravalli commented 10 years ago

Hi! I find this an interesting project and I would like to collaborate. I have only a couple of questions:

Would it make sense to have the tests in a separate folder? So that they don't "spoil" the code within the library
Is there a precise reason why the import statements refere the $GOPATH and they are not referring the same package? The thing is that if I make some changes I don't see them in the examples. I hope the explanation is understandable.

I can implement the regressions, cause I have already done it in C.

lazywei commented 10 years ago

@marcoseravalli I'm not sure whether it is a good practice. However, I found it in the Choose a good import path Secion. Maybe we should follow that. You can pull this repository, make changes, and commit to your fork. The point is that you may need to pull this repo to github.com/sjwhitworth/golearn instead of github.com/marcoseravalli/golearn locally.

sjwhitworth commented 10 years ago

Surely it's easier if we just keep everything in the same repo, and then just send pull requests?

Yes, we can have tests in a separate folder. Let's not worry too much about structure at this point. We should start writing stuff, and then refactor as we go along - otherwise, it's premature optimisation.

mseravalli commented 10 years ago

I read a bit more about the structure of the packages and it makes sense to keep it the current way.

Ok sure, let's start writing some stuff!

sjwhitworth commented 10 years ago

What would you like to start working on?

mseravalli commented 10 years ago

I can start working on the regressions. Is it ok?

sjwhitworth commented 10 years ago

Logistic/linear or both?

mseravalli commented 10 years ago

I would start with linear first. Then I can move to the logistic.

sjwhitworth commented 10 years ago

I opened an issue and assigned it to you, @marcoseravalli .

mseravalli commented 10 years ago

ok cool! i'll start working on it!

macmania commented 10 years ago

Hey @sjwhitworth - can you open up an issue for me - neural networks :) Thanks!

mseravalli commented 10 years ago

Do you guys have experience also with other algebra libraries for go? I found this one for example that has also c bindings with BLAS http://godoc.org/code.google.com/p/biogo.matrix

lazywei commented 10 years ago

I also found a interesting organization on Github: https://github.com/gonum They implement some numeric libraries.

I think we should consider whether to use go.matrix in the future or not. ML heavily depends on linear algebra, matrix computations etc., and go.matrix seems not be maintained anymore. In additionally, I found the matrix product in go.matrix is a little not consistency when I implemented the metric functions. So we may need to find a better or more active package.

sjwhitworth commented 10 years ago

It's up to you guys. We should probably fork whatever library you prefer have done, and then build on top of that, and merge back into their master if it proves beneficial for them. I'll leave the decision up to you two @lazywei + @marcoseravalli.

lazywei commented 10 years ago

It seems both of gonum/matrix/mat64 and biogo.matrix are created by same author, and it seems that the former are more active. So I prefer to use gonum/matrix/mat64. What do you say, @marcoseravalli ?

sjwhitworth commented 10 years ago

Let's move to mat64.

mseravalli commented 10 years ago

I also think that mat64 is a good choice

sjwhitworth commented 10 years ago

The documentation is pretty rubbish for mat64 @lazywei. We should fork, and write our own docs ourselves.

lazywei commented 10 years ago

Totally agree with you! I will fork and add some docs to functions that I know how they work, and send PR back to upstream.

mseravalli commented 10 years ago

or could it also be an option to fork biogo.matrix? the solution seems to be better documented: http://godoc.org/code.google.com/p/biogo.matrix And since the project is still under development we both can benefit from the changes. What do you think?

sjwhitworth commented 10 years ago

I think that sounds sensible. They seem to implement the same things. And if we're going to make a decision, it should be now, before we have to port lots of code. What say you, @lazywei ?

mseravalli commented 10 years ago

biogo.matrix seems also to provide some more features wrt error handling, but less arithmetic...

anzellai commented 9 years ago

I love the presentation and docs on this project, I think this is going to the right direction. Here I would like to suggest going forward, we may implement a simple "WEB/API" package which can be used as a drop-in replacement to some external service, or simply a showcase for real life usage.

This is a package that may contain a few simple API endpoints and being very simple to understand and use (inspired by Seldon project, for example:

/event/ endpoint to consume data input (general or with model schema definition).
/predict/ endpoint to basically output prediction of what have been trained/consume so far

This can attract wider audience with programming/web background who are seeking simple prediction/recommendation solution without strong ML background (myself being one of them).

Currently although this project is great in golang, if I were to deploy something related to ML in web technology like Python/Java I'm going to build with statsmodels or opt for external service because they are easy to use and understand, but given the awesomeness of golang concurrency and much better latency for API development, this project can also provide a great alternative for those people how to make a simple use case.

What do you think @sjwhitworth and everyone? Although I'm only beginner of Golang, I would like to contribute to this great project to as much extent as I can.

nickpoorman commented 9 years ago

@anzellai I think it's a good idea so people can get an understanding of how to use this project. I do think such a thing should be a separate project repo though.

anzellai commented 9 years ago

@nickpoorman I don't disagree, my suggestion is simply a way forward how to attract wider audience, and follow this project spirit, being simple to use and understand.

Let see how people think about this idea and we may consider how to implement this.

Sentimentron commented 9 years ago

I've spent some time thinking about this and I think it's a good idea. We need to improve our APIs to support various streaming/low-volume retraining and prediction events. At the moment, things in base assume that things will generally a fixed size, I think it's time to change that assumption.

nickpoorman commented 9 years ago

@Sentimentron +1

anzellai commented 9 years ago

+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:

/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/\ [GET] /api/{resources}/report/{method}/ [GET]

We can also keep it really simple and leave all authentication stuff for user to implement.

savorywatt commented 9 years ago

I specifically wrote my own knn, and k means clustering algorthims so I could work with them in a service because golearn had a more data analysis and very static approach to modeling and training. Just feedback about this. At the time (a year ago) it made sense. Things may have changed. It might make sense to think about how the library could be used in a service and have an example of using things in a service. I think a full ML api that adapts and retrains is a little beyond a core libraries scope.

My 2c On Oct 7, 2015 9:54 AM, "Anzel Lai" notifications@github.com wrote:

+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:

/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/\ [GET] /api/{resources}/report/{method}/ [GET]

We can also keep it really simple and leave all authentication stuff for user to implement.

— Reply to this email directly or view it on GitHub https://github.com/sjwhitworth/golearn/issues/7#issuecomment-146219180.

anzellai commented 8 years ago

@savorywatt I agree with your points and probably it's really time to start a separate repo to implement a full ML api. I also think some database supports would be a good idea.

For everyone else, may I ask who would like to drive this forward? I would like to help (or be part of it) and make this happen.

chrmang commented 3 years ago

This issue is open for more that 6 years now. And it's not about something, that can be fixed in code. Can it be closed?

sjwhitworth / golearn

Direction of this project #7