Set up reliable automated benchmarks

Background

There is no "one" way for a contextionary to perform better or worse, but we can approximate all the features we expect through several benchmarks. It should be a "one-click" operation to benchmark one contextionary for all possible scenarios.

Goals

Push runs a travis (or similiar CI) job
The job runs a combination of settings, e.g.
- different k values for kNN classification
- different cut-off parameters for contextual classification
The job can constantly be expanded with new benchmarks
The benchmark results are stored in an easily accessible format (e.g. a HTML page is copied into a GCS bucket which acts as a webserver)

weaviate / contextionary

Set up reliable automated benchmarks #31

Background

Goals