Support overriding Runner in GenerateText

This issue proposes the ability to use a local Runner instance in GenerateText._call rather than the passed-in instance.

This change is motivated by the need to run portions of a prompt in different runners. For instance, a prompt may run a generation in a different model than an evaluation. This may be used when migrating from one model to another, when comparing two different models and when limiting the use of a more-expensive model.

Proposed client use

generation_runner = OpenAIChat(model_id="gpt-3.5-turbo")
evaluation_runner = OpenAIChat(model_id="gpt-4o")

arabic = list(range(1, 6))
latin = ["I", "II", "III", "IV", "V"]
records = [{"arabic": x, "roman": y} for x, y in zip(arabic, latin)]

as_latin = GenerateText(Template("Output as a latin numeral: {{input.arabic}}"), runner=generation_runner)
eval_output = GenerateText(
    Template(
        "Output '1' if the A and B are the same string and '0' otherwise.\n\n"
        "A: {{input.roman}}\n"
        "B: {{previous}}",
        previous=as_latin,
    )
)
result = Output(eval_output).run(evaluation_runner, records)

In the above example, generation_runner will be used to generate the as_latin result and evaluation_runner will be used to generate the final result.

microsoft / sammo

Support overriding Runner in GenerateText #40