Larger-scale testing, cruise control

obo commented 7 years ago

Aside from the travis checks, we definitely need regular (weekly?) tests of learning performance. These tests should take e.g. 8 hours on a GPU.

Let's use this issue to come up with a plan how to best do this.

My preferred solution would be to mimic what we've done for Moses: Web frontend at a particular testing site: http://www.statmt.org/moses/cruise/

Implementation, well, it's a single straighforward bash script: https://github.com/moses-smt/mosesdecoder/tree/master/cruise-control

So my proposed solution:

a subdir cruise-control in NM
with a single bash script that every testing site will simply add to their cron (In our case, the thing put to cron would be to submit an SGE job, so the actual test can happen later, when the queue is empty.)
with web rendering of the logs, similar to Moses rendering

jindrahelcl commented 7 years ago

I agree, but i would refrain from using GPUs for this, since 8 hours per week is pretty costly in our environment. I can't see any reason why these tests could not be run on CPUs. 8-hour GPU time is roughly something over one day, which seems reasonable given the number of CPU machines we have.

Second, we don't need a subdirectory for these tests - just one script in the tests/ dir should do. Then we can have an internal directory somewhere with the results of the tests, but those should not be a part of the repository. (Note: I'd leave discussion about the structure of the tests/ directory to another issue.)

Third, let's open a discussion about whether to use a cron job or something fancier, such as Github hooks, etc. It might be nice to have our local installation of travis, that would run those tests. This would also solve the visualisation for free, and, we'd be working with the same environment.

Last (but not least), I would stick to the well-defined "End-to-end testing" name for this kind of testing, since cruise control is somewhat a closer term to continuous integration (CI), than this, and we have Travis to do CI.

martinpopel commented 7 years ago

If you decide for local Travis (I am not convinced it is better than simple bash+cron, but anyway): Usually, Travis tests each push (i.e. almost each commit). But you can have a special branch (e.g. called end-to-end-tests) with special .travis.yml and merge master to this branch (preferably automatically) just once per week. However, if the tests fail (the performance or running time is much worse than last week), it would be nice to find the culprit commit by something like git bisect and I am not sure if this is possible with Travis.

jlibovicky commented 7 years ago

This has been done by staring the Monkey Deamon.

obo commented 7 years ago

Great! Are the logs available somewhere online?

února 2017 16:24:19 SEČ, "Jindřich Libovický" notifications@github.com napsal:

This has been done by staring the Monkey Deamon.

-- Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

ufal / neuralmonkey

Larger-scale testing, cruise control #268