yandex / rep

Machine Learning toolbox for Humans
http://yandex.github.io/rep/
Other
687 stars 144 forks source link

A Simple but Complete Example? #89

Open dakami opened 8 years ago

dakami commented 8 years ago

HI! I really like your Machine Learning for humans focus. Any chance you might provide an example that:

a) Reads from a CSV file, with labels as the first row b) Builds a classification or a regression model c) Reports whether the model works (split the CSV, do folding, etc) d) Saves the model to disk e) Applies that model to another CSV file with an identical schema, adding a column for predictions or replacing the existing column?

jonas-eschle commented 8 years ago

I think this is may too off-topic. It does not look very difficult and mostly not focused on what the REP-repo provides. Anyway, if you want to do it, why don't you just create it? I would really appreciate that the developers work on implementations rather then on simple examples;) a) use the pandas.to_csv() b) I think there are enough examples in this repo to find out how that works: http://nbviewer.jupyter.org/github/yandex/rep/blob/master/howto/01-howto-Classifiers.ipynb c) Well...using the FoldingClassifier for and the ClassificationReport does this work. HowTo: http://nbviewer.jupyter.org/github/yandex/rep/blob/master/howto/04-howto-folding.ipynb d) there is an example for the new CacheClassifier in the docs. To reliable save it, use pickle. e) same as c) for the classification part and have a look at pandas on how to add columns

What do you think?

Good luck with the challenge you are working on right now ;D

dakami commented 8 years ago

Hmm. I'm curious, what is the topic, and what is it that REP provides?

I've been creating a significant amount of this code. Pickle works in some projects, sometimes, in some contexts. Other times there actually isn't a way to serialize the model at all, which is sort of funny.

arogozhnikov commented 8 years ago

@dakami

it's hard to provide minimal complete example, as many people think about this differently. Physicists are interested in .root (and never use csv), someone is using hdf5.

I'll try to fit something-like-minimal-pipeline this during the next rewriting of example notebooks, but no guarantees so far.

As for pickle – REP contains wrappers for several libraries, the wrappers follow the same scikit-learn-like interface, and we make sure (among other things) that pickle works for them.

Additionally there are meta-estimators to compose models / simplify training process and some other sweeties (check out documentation for details).

anaderi commented 8 years ago

@dakami , I'd really appreciate if you could add a how-to similar to this one howto example with all the steps you've mentioned. If you get stuck, let us know.