symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
137 stars 5 forks source link

Remove Docker evaluation volume if evaluation is aborted #366

Open bauersimon opened 1 month ago

bauersimon commented 1 month ago

Please help us keep the number of duplicated issues small.

Brief Description

Canceling mid-evaluation can leave a "dangling" volume, leading to leftover logs making it into the next evaluation results

Reproducer

tree evaluation-2024-10-21-12\:23\:04/
evaluation-2024-10-21-12:23:04/
├── config.json
├── evaluation.log
├── symflower_symbolic-execution # First one contains the "light" repository.
│   ├── config.json
│   ├── evaluation.csv
│   ├── evaluation.log
│   └── write-tests
│       └── symflower_symbolic-execution
│           └── golang
│               └── golang
│                   ├── light
│                   │   └── evaluation.log
│                   └── plain
│                       └── evaluation.log
└── symflower_symbolic-execution-0 # Second one contains only "plain".
    ├── config.json
    ├── evaluation.csv
    ├── evaluation.log
    ├── README.md
    └── write-tests
        └── symflower_symbolic-execution
            └── golang
                └── golang
                    └── plain
                        └── evaluation.log

Logs

No response

Additional Information

No response

Version

5ca853c