symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
124 stars 5 forks source link

Better error handling - panic instead of "failing up" #28

Open rnbwdsh opened 5 months ago

rnbwdsh commented 5 months ago

If the gotestsum isn't installed or your $GOPATH isn't in your $PATH, you'll fail to call gotestsum after querying your model.

Here is an example output. The part that's actually relevant is marked as bold and it's kind of in the middle of the other "spam" that's not really relevant.

Another standard error close to this position was, that my generated program tried to include libs that I don't have installed.

GOROOT=/usr/lib/go #gosetup GOPATH=/usr/lib/go/bin #gosetup /usr/lib/go/bin/go build -o /home/m/.cache/JetBrains/IntelliJIdea2023.2/tmp/GoLand/_2codegemma7b github.com/symflower/eval-dev-quality/cmd/eval-dev-quality #gosetup /home/m/.cache/JetBrains/IntelliJIdea2023.2/tmp/GoLand/2codegemma_7b evaluate --model ollama/codegemma:7b 2024/04/11 13:02:26 Checking that models and languages can be used for evaluation 2024/04/11 13:02:26 Evaluating model "ollama/codegemma:7b" using language "golang" and repository "/home/m/IdeaProjects/eval-dev-quality/testdata/golang/plain" 2024/04/11 13:02:30 Model "ollama/codegemma:7b" responded to query Given the following Go code file "plain.go" with package "plain", provide a test file with the same structure and package="plain" for this code. The tests should produce 100 percent code coverage and must compile. The response must contain only the raw test code and nothing else.

    ```golang
    package plain

    func plain() {
            return // This does not do anything but it gives us a line to cover.
    }
    ```

       Start your answer with tripple backticks.

with: ```go package plain

    import "testing"

    func TestPlain(t *testing.T) {
            plain()
    }
    ```

2024/04/11 13:02:30 $ gotestsum --format standard-verbose --hide-summary skipped -- -cover -v -vet=off ./... 2024/04/11 13:02:30 Evaluated model "ollama/codegemma:7b" using language "golang" and repository "/home/m/IdeaProjects/eval-dev-quality/testdata/golang/plain": encountered 1 problems 2024/04/11 13:02:30 Excluding model "ollama/codegemma:7b" since it was not able to solve the "plain" repository for language "golang": [exec: "gotestsum": executable file not found in $PATH github.com/symflower/eval-dev-quality/language.(LanguageGolang).Execute /home/m/IdeaProjects/eval-dev-quality/language/golang.go:82 github.com/symflower/eval-dev-quality/evaluate.EvaluateRepository /home/m/IdeaProjects/eval-dev-quality/evaluate/repository.go:55 github.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.(Evaluate).Execute /home/m/IdeaProjects/eval-dev-quality/cmd/eval-dev-quality/cmd/evaluate.go:117 github.com/jessevdk/go-flags.(Parser).ParseArgs /usr/lib/go/bin/pkg/mod/github.com/jessevdk/go-flags@v1.5.1-0.20210607101731-3927b71304df/parser.go:335 github.com/jessevdk/go-flags.(Parser).Parse /usr/lib/go/bin/pkg/mod/github.com/jessevdk/go-flags@v1.5.1-0.20210607101731-3927b71304df/parser.go:191 github.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.Execute /home/m/IdeaProjects/eval-dev-quality/cmd/eval-dev-quality/cmd/command.go:26 main.main /home/m/IdeaProjects/eval-dev-quality/cmd/eval-dev-quality/main.go:8 runtime.main /usr/lib/go/src/runtime/proc.go:271 runtime.goexit /usr/lib/go/src/runtime/asm_amd64.s:1695 plain.go] 2024/04/11 13:02:30 Evaluating models and languages 2024/04/11 13:02:30 Evaluation score for "ollama/codegemma:7b": #executed=0.0%(0/1), #problems=100.0%(1/1), average statement coverage=0.0%

Calling with a symflower model when you don't have the symflower stuff installed also doesn't fail very gracefully.

zimmski commented 4 months ago

There are two parts to this issue: