Switch default model to gpt-4-turbo

kalanchan commented 8 months ago

@beyang we chatted about this during the grooming session, could you provide a summary of what you're thinking here? We should also loop in @chillatom as he's working on things that affect retention and this could be one of them

chillatom commented 8 months ago

We should roll this into the retention testing work.

Slack discussion on the topic

Summary - I suspect that user preference comes into play with the "best in class" models. We should test which default setting yields the greatest user retention. Added cost of a better model can be modeled against the ROI associated with providing a better experience. If a cheaper model has the same user retention, we should minimize cost. If a better model is more expensive but yields better UX that yields better retention at a level we think is worth the cost tradeoff, great.

At a minimum we should upgrade to the Claude 3.0 Sonnet model as the baseline (no A/B needed). Then we should test the two best in class models against it as a baseline.

Internally test the claude 3 variants and convince ourselves that there are no bugs.
Next release we change the default to 3.0 Sonnet, pending any findings from (1)
Also launch a test that runs that evals Sonnet vs. Opus vs. GPT4-t as the default model. Measure the impact on retention and make a call. i.e. Is GPT-4 so much better that it warrants additional cost per user?

kalanchan commented 7 months ago

closing this in favour of claude3

sourcegraph / cody

Switch default model to gpt-4-turbo #3316