Open prescod opened 10 months ago
I thought this was a duplicate of #502 , but I have also verified it with 0.1.5
This is not a duplicate of #502, which is caused by constrained generation.
In each gen
eration, the model._shared_state["num_calls_made"]
is increased and until the max_calls
is reached.
@prescod As per @tshu-w , this seems to be by design. You can set the max_calls
to change this limit (see https://github.com/guidance-ai/guidance/issues/502#issuecomment-1840162301), but you'll still need to address it in your design.
@ninowalker :
Why would there be a limit to the number of times I am allowed to called a models.OpenAI
object? How would that help me?
Why is it a flaw in my design that I want to reuse a models.OpenAI
object?
When I do not reuse the models.OpenAI
object, its initialization takes up 90% of the runtime of my program. It is absolutely killing performance.
How could I address this in my design? I'm damned if I do reuse the object and damned if I don't.
@prescod - The library authors felt it necessary to have this max_calls
limit, probably because of the way they try to compensate for feature variance across platforms, see #502.
You can simply set it on your client after you've instantiated it:
gpt = models.OpenAI("gpt-3.5-turbo")
gpt.max_calls = 10**6
In my experience, OpenAI answer quality degrades after a large number of tokens. Depending on your usage, this may or may not matter.
The bug When I try to reuse a
models.OpenAI
object (because the tokenizer is very slow to load) I get an error aboutmax_calls
. It's unclear howTo Reproduce
System info (please complete the following information): MacOS
guidance.__version__
): '0.1.5'