issues
search
symflower
/
eval-dev-quality
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
57
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Improve maintainability of assessments by abstracting away details of how assessments are stored
#178
ahumenberger
closed
2 weeks ago
0
fix, Allow arbitrary content immediately after code tag
#177
bauersimon
closed
2 weeks ago
0
If results folder already exists, add suffix but don't overwrite or error
#176
bauersimon
closed
2 weeks ago
0
Collect Go coverage if tests trigger panic
#175
bauersimon
closed
2 weeks ago
1
Deal with dependencies requested by LLMs
#174
ahumenberger
opened
2 weeks ago
2
LLM result parsing bug
#173
bauersimon
closed
2 weeks ago
3
fix, Use a backoff for retrying LLM queries because it seems that some LLMs need longer to recover
#172
zimmski
closed
2 weeks ago
0
fix, Fail tests immediately in case tool is outdated or unusable
#171
bauersimon
closed
2 weeks ago
0
Code repairing task to enable models to fix code with compilation errors
#170
ruiAzevedo19
closed
1 week ago
1
Improve maintainability of assessments
#169
ahumenberger
closed
2 weeks ago
2
Evaluation task: Code repair
#168
ruiAzevedo19
closed
1 week ago
0
Fixed timeouts for `symflower unit-tests` and `symflower test`
#167
Munsio
opened
3 weeks ago
2
Introduce the concept of "tasks" to prepare for different evaluation tasks like "write tests" and "repair code"
#166
ahumenberger
closed
2 weeks ago
1
Support multiple evaluation tasks
#165
ahumenberger
opened
3 weeks ago
0
`ollama_llama_server` and other background processes we start must be killed on CTRL+C
#164
zimmski
opened
3 weeks ago
3
Automatic selection of repositories is broken
#163
zimmski
closed
1 week ago
0
Check for Java and Go compilation errors when building a project, for further compile error code repairing task
#162
ruiAzevedo19
closed
2 weeks ago
2
Do not ignore coverage count if there are failing tests
#161
ahumenberger
closed
3 weeks ago
0
New task to check for Go and Java compilation errors
#160
ruiAzevedo19
closed
2 weeks ago
0
https://github.com/symflower/eval-dev-quality/pull/155/files missing a test
#159
zimmski
closed
1 week ago
3
Deal with failing tests
#158
zimmski
closed
3 weeks ago
0
Logic for "Create temporary repositories for each language so the repository is copied only once per language." copies more than needed
#157
zimmski
closed
2 weeks ago
0
Running Ollama tests with the wrong Ollama binary should fail hard
#156
zimmski
closed
2 weeks ago
1
Test file path needs to be OS aware
#155
Munsio
closed
3 weeks ago
0
Update Ollama to 0.1.41 to have all the latest Windows fixes
#154
bauersimon
closed
3 weeks ago
0
Download Go dependencies when executing tests
#153
bauersimon
closed
3 weeks ago
0
The prompt uses different paths depending on the OS
#152
Munsio
opened
3 weeks ago
0
Evaluation folder with date cannot be created on windows
#151
bauersimon
opened
3 weeks ago
0
Add the testify package dependency to the Golang light repository, so `symflower test` can execute the generated tests
#150
ruiAzevedo19
closed
3 weeks ago
0
Scripts to fetch ollama/openrouter models
#149
Munsio
closed
2 weeks ago
0
Reset repository per task
#148
bauersimon
closed
1 month ago
0
Repository not reset for multiple tasks
#147
bauersimon
closed
1 month ago
0
Use empty Git config in temporary repositories
#146
bauersimon
closed
1 month ago
0
The git repository change requires the GPG password
#145
Munsio
closed
1 month ago
0
Require at least symflower v36800
#144
bauersimon
closed
1 month ago
0
Java
#143
ruiAzevedo19
closed
1 month ago
0
Track how many characters were present in a model response and generated test files
#142
ruiAzevedo19
closed
1 month ago
0
Follow-Up from using Git to reset the temporary directory
#141
Munsio
closed
2 weeks ago
0
Cancel previous runs of the CI when a new push happened to a PR
#140
Munsio
closed
1 month ago
0
Explicitly check the interface that is setting the query attempts, to ensure the model implements all its methods
#139
ruiAzevedo19
closed
1 month ago
0
refactor, Move the error used in the evaluation tests to a variable, to avoid copying it the test suites
#138
ruiAzevedo19
closed
1 month ago
0
Remove the need to change the provider registery in tests to make test code concurrency safe
#137
ruiAzevedo19
closed
1 month ago
1
refactor, Move evaluation logic into evaluation package for isolation of concern
#136
zimmski
closed
1 month ago
0
Test for pulling Ollama model is flaky
#135
zimmski
opened
1 month ago
2
More Java task cases for test generation
#134
zimmski
closed
1 month ago
1
Make sure to use uint64 consistently for metrics and scoring, and allow more task cases by always working on a clean repository
#133
zimmski
closed
1 month ago
1
Clean up query attempt code
#132
zimmski
closed
1 month ago
1
Follow-up: Allow to retry a model when it errors
#131
zimmski
closed
1 month ago
0
Combined early merges
#130
ruiAzevedo19
closed
1 month ago
1
Do not cancel successive runs if previous runs had problems
#129
bauersimon
closed
1 month ago
0
Previous
Next