Closed abeatrix closed 4 days ago
‼️ Hey @sourcegraph/cody-security, please review this PR carefully as it introduces the usage of an unsafe_
function or abuses PromptString.
as discussed with @jtibshirani , i am running into issue with re-running the tests so we will merge what we have now and update the prompt and scoring in follow up PRs
llmJudgeChatTemplate
to generate a prompt for evaluating LLM responsesLlmJudge
into theevaluateChatStrategy
to score each chat responseEvaluationDocument
for each chat responseThere are still follow-up works that we can do, including improving the prompt used for the LLM judge, but currently, it works as intended and IMO a good starting point for us to build from.
Test plan
Currently being used by https://github.com/sourcegraph/cody-leaderboard/pull/8