eval-dev-quality: command not found

Hambaobao commented 2 months ago

Hello, and thank you for your assistance. I encountered some issues during the evaluation process. I followed your instructions to install all the necessary packages:

git clone https://github.com/symflower/eval-dev-quality.git
cd eval-dev-quality
go install -v github.com/symflower/eval-dev-quality/cmd/eval-dev-quality

However, I'm facing an issue where the terminal returns: eval-dev-quality: command not found. Could you please help me identify what might be missing?

zimmski commented 2 months ago

Can you please post the full output of the console? There is usually a trace on which command is missing.
Can you also post the output of which eval-dev-quality please

Did you try to run the benchmark using the container image? https://github.com/symflower/eval-dev-quality#run-the-evaluation-either-with-the-built-or-pulled-image

Hambaobao commented 2 months ago

Thank you very much for your prompt reply. I'm not very familiar with Go. I just checked, and which eval-dev-quality showed eval-dev-quality not found. I eventually found eval-dev-quality in /root/go/bin/. I will try it again.

Hambaobao commented 2 months ago

Hi, I am now able to successfully conduct tests, but the final log shows:

2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “java” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Excluding model “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” for language “ruby” because it did not succeed basic checks
2024/09/11 17:37:36 Evaluation score for “custom-vllm/DeepSeek-Coder-V2-Lite-Instruct” (“category-unknown”): score=2259, coverage=760, files-executed=52, files-executed-maximum-reachable=77, generate-tests-for-file-character-count=55919, processing-time=242195, response-character-count=56956, response-no-error=77, response-no-excess=75, response-with-code=75, tests-passing=1220

It appears I am unable to test Java and Ruby languages. Is this normal, or is it caused by my environment not being configured correctly? My current environment can normally evaluate Multipl-E.

symflower / eval-dev-quality

eval-dev-quality: command not found #336