Open macmania opened 10 years ago
I'd rather write them ourselves. That way, we can better optimise to any advantages that Go will give us - e.g. using goroutines to independently train random forests. Plus, I could do with the practice :)
Then I think we may need to implement some basic matrix computation algorithms first? If we want to re-build whole algorithms, it's definitely we will heavily need some fundamental infrastructures.
Yes, I agree. There has been a good start on a matrix library that is imported in this library, but it's probable that we will need to do a lot more ourselves. Eigenvectors, etc.
If we want to do so many things by ourself, we probably should think a better way to organize the packages / subpackages etc. Any thought on this?
I'm agnostic. I'm willing to take suggestions on it, or just to start writing things, and continually refactor.
Can you work on the documentation part of the project just to give an idea for the people doing the project on how to run and test a particular component. And if you have any kind of standards in terms of what coding standards - formatting, naming conventions, etc. :)
Agree! We do need some formatting, naming conventions, so that others can contribute more easily. For formatting, I recommend to use http://godoc.org/code.google.com/p/go.tools/cmd/goimports For naming conventions, I have no idea yet. Another point is, how do we organize so many learning algorithms? Every algorithms in their own "golearn/xxx" package?
@macmania: Sure I can. Like I said, I'm definitely willing to take suggestions from others. I don't want to be the sole arbiter of the project's direction. I will try and write some up tonight.
@lazywei: We should group algorithms together by commonalities. For example, a decision tree package can contain CART, Random Forests, etc. A neural network package could contain simple neural nets, RBM's, etc. So, not grouping by supervised/unsupervised, but by shared approaches to learning methods. I'd imagine that we'd also have a data package for cross validation/label encoding, some additional matrix/distance methods in another package, and some utility functions.
would it be better if we had grouped the algorithms by supervised and unsupervised? We can do that later on if we have tons of packages. @sjwhitworth: sounds good, I'll work on the library after you've done the documentation ^_^
@sjwhitworth: let me know if you need help writing the documentation for the project.
@macmania: Sure, why not. Let's just get writing some stuff, and we can refactor when we need to :) Definitely would love help writing documentation - I'll just put it all in a Markdown file for now, and then we can pretty it up later.
awesome - do you mind putting a google drive document url so we can edit as we go?
can you make it editable?
Hi! I find this an interesting project and I would like to collaborate. I have only a couple of questions:
I can implement the regressions, cause I have already done it in C.
@marcoseravalli
I'm not sure whether it is a good practice. However, I found it in the Choose a good import path Secion. Maybe we should follow that.
You can pull this repository, make changes, and commit to your fork. The point is that you may need to pull this repo to github.com/sjwhitworth/golearn
instead of github.com/marcoseravalli/golearn
locally.
Surely it's easier if we just keep everything in the same repo, and then just send pull requests?
Yes, we can have tests in a separate folder. Let's not worry too much about structure at this point. We should start writing stuff, and then refactor as we go along - otherwise, it's premature optimisation.
I read a bit more about the structure of the packages and it makes sense to keep it the current way.
Ok sure, let's start writing some stuff!
What would you like to start working on?
I can start working on the regressions. Is it ok?
Logistic/linear or both?
I would start with linear first. Then I can move to the logistic.
I opened an issue and assigned it to you, @marcoseravalli .
ok cool! i'll start working on it!
Hey @sjwhitworth - can you open up an issue for me - neural networks :) Thanks!
Do you guys have experience also with other algebra libraries for go? I found this one for example that has also c bindings with BLAS http://godoc.org/code.google.com/p/biogo.matrix
I also found a interesting organization on Github: https://github.com/gonum They implement some numeric libraries.
I think we should consider whether to use go.matrix in the future or not. ML heavily depends on linear algebra, matrix computations etc., and go.matrix
seems not be maintained anymore.
In additionally, I found the matrix product in go.matrix
is a little not consistency when I implemented the metric functions.
So we may need to find a better or more active package.
It's up to you guys. We should probably fork whatever library you prefer have done, and then build on top of that, and merge back into their master if it proves beneficial for them. I'll leave the decision up to you two @lazywei + @marcoseravalli.
It seems both of gonum/matrix/mat64
and biogo.matrix
are created by same author, and it seems that the former are more active. So I prefer to use gonum/matrix/mat64
. What do you say, @marcoseravalli ?
Let's move to mat64.
I also think that mat64 is a good choice
The documentation is pretty rubbish for mat64 @lazywei. We should fork, and write our own docs ourselves.
Totally agree with you! I will fork and add some docs to functions that I know how they work, and send PR back to upstream.
or could it also be an option to fork biogo.matrix? the solution seems to be better documented: http://godoc.org/code.google.com/p/biogo.matrix And since the project is still under development we both can benefit from the changes. What do you think?
I think that sounds sensible. They seem to implement the same things. And if we're going to make a decision, it should be now, before we have to port lots of code. What say you, @lazywei ?
biogo.matrix seems also to provide some more features wrt error handling, but less arithmetic...
I love the presentation and docs on this project, I think this is going to the right direction. Here I would like to suggest going forward, we may implement a simple "WEB/API" package which can be used as a drop-in replacement to some external service, or simply a showcase for real life usage.
This is a package that may contain a few simple API endpoints and being very simple to understand and use (inspired by Seldon project, for example:
This can attract wider audience with programming/web background who are seeking simple prediction/recommendation solution without strong ML background (myself being one of them).
Currently although this project is great in golang, if I were to deploy something related to ML in web technology like Python/Java I'm going to build with statsmodels
or opt for external service because they are easy to use and understand, but given the awesomeness of golang concurrency and much better latency for API development, this project can also provide a great alternative for those people how to make a simple use case.
What do you think @sjwhitworth and everyone? Although I'm only beginner of Golang, I would like to contribute to this great project to as much extent as I can.
@anzellai I think it's a good idea so people can get an understanding of how to use this project. I do think such a thing should be a separate project repo though.
@nickpoorman I don't disagree, my suggestion is simply a way forward how to attract wider audience, and follow this project spirit, being simple to use and understand.
Let see how people think about this idea and we may consider how to implement this.
I've spent some time thinking about this and I think it's a good idea. We need to improve our APIs to support various streaming/low-volume retraining and prediction events. At the moment, things in base
assume that things will generally a fixed size, I think it's time to change that assumption.
@Sentimentron +1
+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:
/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/\ [GET] /api/{resources}/report/{method}/ [GET]
We can also keep it really simple and leave all authentication stuff for user to implement.
I specifically wrote my own knn, and k means clustering algorthims so I could work with them in a service because golearn had a more data analysis and very static approach to modeling and training. Just feedback about this. At the time (a year ago) it made sense. Things may have changed. It might make sense to think about how the library could be used in a service and have an example of using things in a service. I think a full ML api that adapts and retrains is a little beyond a core libraries scope.
My 2c On Oct 7, 2015 9:54 AM, "Anzel Lai" notifications@github.com wrote:
+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:
/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/\ [GET] /api/{resources}/report/{method}/ [GET]
We can also keep it really simple and leave all authentication stuff for user to implement.
— Reply to this email directly or view it on GitHub https://github.com/sjwhitworth/golearn/issues/7#issuecomment-146219180.
@savorywatt I agree with your points and probably it's really time to start a separate repo to implement a full ML api. I also think some database supports would be a good idea.
For everyone else, may I ask who would like to drive this forward? I would like to help (or be part of it) and make this happen.
This issue is open for more that 6 years now. And it's not about something, that can be fixed in code. Can it be closed?
I was wondering if it would be better to focus on a particular class in machine learning that is computationally intensive - dimensionality reduction, neural networks, fourier transform, etc rather than re-writing the algorithms. My take is if we just use one of the multi-translator compiler - python -> go or c++ -> go, we might be able to save some time. The team can just do some tests.