Follow up - Isolated evaluations

symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

MIT License

58 stars 3 forks source link

Open Munsio opened 5 days ago

Munsio commented 5 days ago

Following tasks need to bee addressed:

[ ] Check how to run a docker container in github actions
[ ] Add tests for the "docker" runtime to which succeed inside the Github-CI
[ ] Add tests for running "docker" runtime with different --parallel arguments
[ ] Add tests for running "kubernetes" runtime
- [ ] Test at least the template generation