symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
57 stars 3 forks source link

Keep individual coverage files and LLM query/responses #204

Open zimmski opened 1 week ago

zimmski commented 1 week ago

We need to keep all interactions. That includes the coverage files we are collecting.

bauersimon commented 1 week ago

What about https://github.com/symflower/eval-dev-quality/issues/181 then? Close?

bauersimon commented 1 day ago

I think the cleanest solution would be to use logrus "Hooks". That way we can keep most of our logging as is, but i.e. log prompts with a special type=prompt attribute and add a hook to the logging that also writes the prompt content into a separate file.