symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
57 stars 3 forks source link

Parallel execution of containerized evaluations #221

Open Munsio opened 2 days ago

Munsio commented 2 days ago

@bauersimon I would like to have your opinion on this one.

bauersimon commented 1 day ago

plz rebase and then tag me again

Munsio commented 1 day ago

Alright I got it to work. I also updated the follow up issue already to also add tests for --parallel when we run docker tests on the CI.

Please check.

Also if you have a idea for testing the Parallel stuff please tell me.

Also here is the command I used to run:

make install && eval-dev-quality evaluate --runtime docker --result-path ./docker-test --runs 5 --model symflower/symbolic-execution --model symflower/symbolic-execution --model symflower/symbolic-execution --repository golang/plain --parallel 2

When running this command you can use watch docker ps to see that first 2 containers are started and after they finished a 3 one will run.