symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
137 stars 5 forks source link

eval-dev-quality: command not found #336

Closed Hambaobao closed 2 months ago

Hambaobao commented 2 months ago

Hello, and thank you for your assistance. I encountered some issues during the evaluation process. I followed your instructions to install all the necessary packages:

git clone https://github.com/symflower/eval-dev-quality.git
cd eval-dev-quality
go install -v github.com/symflower/eval-dev-quality/cmd/eval-dev-quality

However, I'm facing an issue where the terminal returns: eval-dev-quality: command not found. Could you please help me identify what might be missing?

zimmski commented 2 months ago

Did you try to run the benchmark using the container image? https://github.com/symflower/eval-dev-quality#run-the-evaluation-either-with-the-built-or-pulled-image

Hambaobao commented 2 months ago

Thank you very much for your prompt reply. I'm not very familiar with Go. I just checked, and which eval-dev-quality showed eval-dev-quality not found. I eventually found eval-dev-quality in /root/go/bin/. I will try it again.

Hambaobao commented 2 months ago

Hi, I am now able to successfully conduct tests, but the final log shows:

2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Evaluation score for “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” (“category-unknown”): score=2259, coverage=760, files-executed=52, files-executed-maximum-reachable=77, generate-tests-for-file-character-count=55919, processing-time=242195, response-character-count=56956, response-no-error=77, response-no-excess=75, response-with-code=75, tests-passing=1220

It appears I am unable to test Java and Ruby languages. Is this normal, or is it caused by my environment not being configured correctly? My current environment can normally evaluate Multipl-E.