Need more flexible task scheduler

st-pasha commented 5 years ago

The benchmark suite grows both in depth and in width. In the near future we will not be able to afford the "run everything" strategy anymore. Limiting the number of tasks that are run every day will be a necessity. Approaches such as #50 are a good starting point, but that may not be enough.

I propose to implement a new system, where at the beginning of the benchmarking session each task will be assigned a "worth" score. The tasks can then be sorted by that score, and only the top ones run (until the benchmarking server runs out of time).

The function that evaluates the worthiness of each task will be dynamic, taking into account many factors:

How long ago the task was run before;
Estimated run time of the task, based on the previous runs;
Relative importance of the task (e.g. join/groupby may be more useful than filtering/updating);
Relative importance of the solution being measured (e.g. data.table / datatable should be tested more frequently than others);
Age of the latest version of the solution, esp. compared to the time the task was run before (#50);
Manual overrides, allowing the user to force execution of a particular task;
etc.

mattdowle commented 5 years ago

This proposal, although good in a perfect world, seems overly complicated for the resources we have right now. Putting the slow products onto a weekly schedule and keeping the fast products running daily is simpler and would be more than adequate for the time being. In any case I'm not sure what #50's proposal actually is. Which is why I had already reopened it and asked for clarification.

jangorecki commented 5 years ago

Idea sounds interesting but at the present moment there is enough of maintenance around various exceptions in different tool so that growth is actually blocked but that. I even stopped reporting issues to upstream projects due to lack of time to produce isolated reproducible examples. Once all exceptions will be handled I think it will be running smoothly. Most of tools are not on devel versions (as of now) so they won't run often anyway. Once we will have join and maybe a next task then we will probably need to prioritize tests.

jangorecki commented 5 years ago

Benchmark is not being run often these days. Once a week or once every two weeks. The reason for that is that most of the tools are not being run anyway. It is because we are using their release version, not development. And this is caused by the fact they don't publish development releases, nightly snapshots, etc. Pulling master branch, building from source and using that is not really a proper way to go (sometimes even discouraged by devs of a solution). If developers of a tool are not publishing their development builds then we could assume they have good reasons. Maybe it is not ready to be used, might not even build, might contain security issues, whatever. We currently build from sources pydt and dplyr, hope to switch to nightly releases soon. In my opinion with the current load of benchmark we can safely close this issue, as the time is not a problem after implementing #50.

h2oai / db-benchmark

Need more flexible task scheduler #52