symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
137 stars 5 forks source link

Add new subcommand `verify` that just does repository verification and run that in CI #330

Open bauersimon opened 2 months ago

bauersimon commented 2 months ago

We just added the Ruby testdata without much thought but the repository structure is not correct (i.e. we check that there is only one test file for some task types, but for Ruby we also have the test_init.rb which then triggers an error), this should be a CI failure.