sourcegraph / cody

AI that knows your entire codebase
https://cody.dev
Apache License 2.0
2.22k stars 213 forks source link

Bench: add LLM judge for response scoring to chat strategy #4678

Closed abeatrix closed 4 days ago

abeatrix commented 5 days ago

There are still follow-up works that we can do, including improving the prompt used for the LLM judge, but currently, it works as intended and IMO a good starting point for us to build from.

Test plan

Currently being used by https://github.com/sourcegraph/cody-leaderboard/pull/8

image
github-actions[bot] commented 5 days ago

‼️ Hey @sourcegraph/cody-security, please review this PR carefully as it introduces the usage of an unsafe_ function or abuses PromptString.

abeatrix commented 4 days ago

as discussed with @jtibshirani , i am running into issue with re-running the tests so we will merge what we have now and update the prompt and scoring in follow up PRs