DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
85
stars
5
forks
source link
Dump the assessments in the CSV files once they happen and not in the end of all executions #237
Open
ruiAzevedo19 opened 2 weeks ago
TODO
1st iteration
evaluation.csv
every time a task is executed2nd iteration
evaluation.csv
evaluation.csv
> sum the results > write to filemodels-summed.csv
<language>-summed.csv
Follow-ups
evaluation.csv
are processedEvaluationRecordsPerModel
to bemap[string][]*metrics.Assessments