Dump the assessments in the CSV files once they happen and not in the end of all executions

symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

MIT License

85 stars 5 forks source link

TODO

1st iteration

[x] Dump the assessments into the evaluation.csv every time a task is executed

2nd iteration

[x] Create the other CSVs from the evaluation.csv
- read evaluation.csv > sum the results > write to file
- [x] models-summed.csv
- [x] <language>-summed.csv

[x] Store the tasks and repositories in the evaluation records
- https://github.com/symflower/eval-dev-quality/pull/241#discussion_r1666425640
[ ] Add the model human-readable name and its costs only when the raw records of the evaluation.csv are processed
- https://github.com/symflower/eval-dev-quality/pull/241#discussion_r1666436342
[x] Change EvaluationRecordsPerModel to be map[string][]*metrics.Assessments
- https://github.com/symflower/eval-dev-quality/pull/241#discussion_r1668526526