This issue proposes the ability to use a local Runner instance in GenerateText._call rather than the passed-in instance.
This change is motivated by the need to run portions of a prompt in different runners. For instance, a prompt may run a generation in a different model than an evaluation. This may be used when migrating from one model to another, when comparing two different models and when limiting the use of a more-expensive model.
Proposed client use
generation_runner = OpenAIChat(model_id="gpt-3.5-turbo")
evaluation_runner = OpenAIChat(model_id="gpt-4o")
arabic = list(range(1, 6))
latin = ["I", "II", "III", "IV", "V"]
records = [{"arabic": x, "roman": y} for x, y in zip(arabic, latin)]
as_latin = GenerateText(Template("Output as a latin numeral: {{input.arabic}}"), runner=generation_runner)
eval_output = GenerateText(
Template(
"Output '1' if the A and B are the same string and '0' otherwise.\n\n"
"A: {{input.roman}}\n"
"B: {{previous}}",
previous=as_latin,
)
)
result = Output(eval_output).run(evaluation_runner, records)
In the above example, generation_runner will be used to generate the as_latin result and evaluation_runner will be used to generate the final result.
This issue proposes the ability to use a local Runner instance in
GenerateText._call
rather than the passed-in instance.This change is motivated by the need to run portions of a prompt in different runners. For instance, a prompt may run a generation in a different model than an evaluation. This may be used when migrating from one model to another, when comparing two different models and when limiting the use of a more-expensive model.
Proposed client use
In the above example,
generation_runner
will be used to generate theas_latin
result andevaluation_runner
will be used to generate the final result.