symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
57 stars 3 forks source link

Multiple model parameter with same value result in multiple evaluations #220

Open Munsio opened 2 days ago

Munsio commented 2 days ago

When running the evaluation and specifying the same model multiple times currently the evaluation is run X amount of times for the model.

Example:

eval-dev-quality evaluate --runtime docker --result-path ./docker-test --runs 5 --model symflower/symbolic-execution --model symflower/symbolic-execution --model symflower/symbolic-execution --repository golang/plain

This runs the symflower/symbolic-execution 3 times with 5 runs as the model was 3 times specified as parameter.

Question: Do we want this behavior or should we unique the list of models after parsing?