Chat: Update context preamble to minimize hedgings

abeatrix commented 3 days ago

PART OF https://linear.app/sourcegraph/issue/CODY-2507/evaluate-claude-35-sonnet-as-new-default

Update the preamble that works across different models while minimizing hedgings.

Test plan

Updated leaderboard locally with results from this change in https://github.com/sourcegraph/cody-leaderboard/pull/12

With the updated preamble, we can see it has decreased the hedging occurrences (having issues when running the evals so I removed the gemini models to get it to run quicker):

The one question that failed looks like a false positive to me:

Before

jtibshirani commented 2 days ago

I like how the preamble is still simple and to-the-point! I think we need to apply it only to the Claude 3.5 models though. I dug through the leaderboard results, and noticed significantly more hallucination from all other models: https://github.com/sourcegraph/cody-leaderboard/pull/12#issuecomment-2203937606.

I think of this addition like a "confidence preamble" :) We should apply it only to models who demonstrate excessive hedging behavior, like Claude 3.5 (and previously Claude 2.1).

abeatrix commented 2 days ago

apply it only to the Claude 3.5 models though

Ahh good point! Let me make that change, thanks for the suggestion @jtibshirani !

sourcegraph / cody

Chat: Update context preamble to minimize hedgings #4742

Test plan

Before