Add performance benchmarking for algorithms

amitkgupta commented 10 years ago

To ensure refactorings don't break performance
To help look for optimizations and prove objectively that new optimizations actually improve performance
Etc.

Sentimentron commented 10 years ago

I like this idea a lot, but we have to be mindful of the practicalities of checking lots of data into the tree. We could host the data in a separate repo and use a download script.

sjwhitworth commented 10 years ago

Yes, this is how we should do it. Only store code in the repo, but use a Go script to download it all.

amitkgupta commented 10 years ago

I've started writing a benchmarking suite, here's a quick update on philosophy, features, current status, caveats, and open questions.

Philosophy: the general idea is to have a suite of tests that stress the algorithms in the golearn library in a number of ways, establishing benchmarks for accuracy and speed. I want the tests to be highly decoupled from the implementation (so, e.g. for classifiers, it should only know how to create them, and then call Fit and Predict on them, and not much else) and also decoupled from the regular workflow on golearn (I don't want people to have to run slow tests or download large datasets to work on golearn). For those reasons, and also since it's a new project that's likely to see a lot of churn for now, it's a separate repo from golearn, but can be copy-pasted in later if keeping things in sync becomes painful.

Given that it's "out of the way", it still needs to provide value as a regression check against changes that hurt performance, and as a standard to decide whether new algorithm optimizations actually improve things. I imagine that the Travis build should go get and then run the tests in the benchmark suite, so that it serves its purpose as a regression suite.

Features: Structurally I plan for it to have a suite for classifiers, a suite for optimization algorithms, etc. Each suite will benchmark behaviour of some number of algorithms (whichever ones are implemented in golearn) against a common set of datasets for the suite. So each suite will consist of three main things: (a) datasets, (b) shared behaviours that make assertions about how an algorithm in the suite performs against a given dataset, and (c) concrete applications of the shared behaviours for the specific algorithms in golearn. One thing that will be nice is that anyone can use (a), and anyone writing an ML library who wraps their algorithms with something that implements the interfaces defined in golearn can even use (b). The idea here fits with the "decoupled" philosophy, namely that this project tries to solve "how do you benchmark an ML library, then apply it to golearn" rather than just "how do you benchmark golearn."

Current Status: I've only started on the Classifier suite, and the shared behaviours for that are done. I only have one basic dataset so far, and have only applied them to one algorithm. Adding more datasets and plugging in different classifiers will be easy now. Not sure yet what the next suite will be.

Caveat: It works against (the develop branch of) my fork of golearn. The only salient difference between that branch and master of this repo is Fit and Predict now include errors in their return signatures. I noticed that Fit would just hang sometimes if the input wasn't what it expected, so after fixing that it was clear Fit, and also Predict, should really be able to return errors.

Open Questions:

What's a good source for datasets?
What variety of datasets should we use? For the Classifier suite, my current thinking is to have (a) a basic dataset, (b) very large dataset, (c) a dataset where the number of features is large, (d) a dataset with mixed-type features, (e) dataset where the "boundaries" between classes is somewhat fuzzy.
What other suites can the algorithms be broken down into? Regressions, Classifiers, Optimizers, Clusterers, ...?

joshrendek commented 10 years ago

@amitkgupta http://www.quandl.com/ is free and has a ton of nice data available

Sentimentron commented 10 years ago

Has been mentioned recently but the libsvm project have some good datasets, though we can't read them yet.

amitkgupta commented 10 years ago

Nice, the libsvm datasets look great, table even shows number of features, classes, total size of dataset, etc. Exactly the kind of breakdown I was hoping for.

On Fri, Aug 22, 2014 at 6:45 AM, Richard Townsend notifications@github.com wrote:

Has been mentioned recently but the libsvm project have some good datasets http://www.csie.ntu.edu.tw/%7Ecjlin/libsvmtools/datasets/, though we can't read them yet.

— Reply to this email directly or view it on GitHub https://github.com/sjwhitworth/golearn/issues/72#issuecomment-53061829.

sjwhitworth / golearn

Add performance benchmarking for algorithms #72