guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.84k stars 1.04k forks source link

`max_calls`/`max_repeated_calls` error #498

Open prescod opened 10 months ago

prescod commented 10 months ago

The bug When I try to reuse a models.OpenAI object (because the tokenizer is very slow to load) I get an error about max_calls. It's unclear how

To Reproduce

from guidance import models, gen, user, system, assistant
import time

gpt = models.OpenAI("gpt-3.5-turbo")

def doit():
    with user():
        lm = gpt + "What is the capital of France?"

    with assistant():
        lm += gen("capital", temperature=0)

    return lm["capital"]

# no crashing but really slow
for i in range(0, 12):
    start = time.time()
    gpt = models.OpenAI("gpt-3.5-turbo")

    print(doit())
    print(time.time() - start)

# faster but crashes

for i in range(0, 12):
    start = time.time()
    print(doit())
    print(time.time() - start)

System info (please complete the following information): MacOS

ninowalker commented 10 months ago

I thought this was a duplicate of #502 , but I have also verified it with 0.1.5

tshu-w commented 10 months ago

This is not a duplicate of #502, which is caused by constrained generation.

In each generation, the model._shared_state["num_calls_made"] is increased and until the max_calls is reached.

ninowalker commented 9 months ago

@prescod As per @tshu-w , this seems to be by design. You can set the max_calls to change this limit (see https://github.com/guidance-ai/guidance/issues/502#issuecomment-1840162301), but you'll still need to address it in your design.

prescod commented 9 months ago

@ninowalker :

Why would there be a limit to the number of times I am allowed to called a models.OpenAI object? How would that help me?

Why is it a flaw in my design that I want to reuse a models.OpenAI object?

When I do not reuse the models.OpenAI object, its initialization takes up 90% of the runtime of my program. It is absolutely killing performance.

How could I address this in my design? I'm damned if I do reuse the object and damned if I don't.

ninowalker commented 9 months ago

@prescod - The library authors felt it necessary to have this max_calls limit, probably because of the way they try to compensate for feature variance across platforms, see #502.

You can simply set it on your client after you've instantiated it:

gpt = models.OpenAI("gpt-3.5-turbo")
gpt.max_calls = 10**6

In my experience, OpenAI answer quality degrades after a large number of tokens. Depending on your usage, this may or may not matter.