sourcegraph / cody

Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
https://cody.dev
Apache License 2.0
2.6k stars 278 forks source link

Create prompt testing tool #220

Closed vdavid closed 8 months ago

vdavid commented 1 year ago

Links:

Idea list:

vdavid commented 1 year ago

@dominiccooney's input:

This is a great start. Here's what I need, based on my experience hacking prompts to this point, but maybe my use is so idiosyncratic you should not worry about it:

  • Inputs are N bags of strings. Some bags are noted as prompt; others as user input. And a function which takes a string from each bag and produces a closure which produces LLM output.
  • Kick off multiple concurrent requests. Key principle is the driver never has to wait for LLMs to spin.
  • Driver sits rating two-up results for which one is better.
  • Algorithm does stats magic to find the best prompt combination, report on what that is, what the error bars are, etc. That algorithm could be GP, or something for sports team ranking, or RL, or whatever. This is where certain bags of strings being notated as user input is important: You want the algorithm to specialize on the good prompts, but measure the performance generalizes across the range of user inputs.
  • Footnote, capture all of this data so we can RHLF our own models with this data later.

If any of those ideas aren't clear, I'm happy for follow-ups.