Isolation of evaluations

For going forward we need to isolate the evaluation runs. This will allow us in the end to run evaluations of multiple models in parallel on a single host or in a cluster.

1 Iteration:

[x] Create a Docker image
- [x] Contains eval-dev-quality binary
- [x] Has all the necessary prerequisites installed from an archive with fixed versions
- [x] Java
- [x] Maven
- [x] Gradle
- [x] Go
- [x] eval-dev-quality install-all
[x] Documentation
- [x] How to build locally (bash script)
- [x] How to run (bash script)
- [x] Script to run multiple local docker instances simultaneously
- [x] ~~Script to run multiple instances inside kubernetes~~

2 Iteration:

[x] Build the image on each PR (+ main) and publish it on Github registry
[x] Add an additional option --runtime docker (default is "local" which runs as before)
- [x] If specified each model will be run inside a docker container locally
[x] Add an additional option --parallel $uint (default is "1")
- [x] The --parallel defines how many models are running in parallel
- [ ] The option is only allowed if the runtime != local
- [x] ~~Add a check for --sequential to be only allowed when runtime == local~~
- [ ] Print an information that --sequential is skipped if runtime != local but passed on to the subsequent runs

3 Iteration:

[ ] Add an additional runtime kubernetes
[ ] Runs all the models simultaneously on a Kubernetes cluster
[ ] Uses the local installed kubectl cmd and default context

4 Iteration:

[ ] Merge the results from runtimes docker and kubernetes back into into a summary

5 Iteration:

[ ] Support Ollama in container

symflower / eval-dev-quality

Isolation of evaluations #198

1 Iteration:

2 Iteration:

3 Iteration:

4 Iteration:

5 Iteration: