Token Limit - Error Rate limit reached for gpt-3.5-turbo

DrorMarkus commented 2 months ago

I attempted to run Lloom on a sample corpus consisting of news articles (I am running on python 3.9 and downgraded the openAI version as stated in the instructions).

When I first tried to run, I received the following error when attempting distillation:

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised 
RateLimitError: Rate limit reached for gpt-3.5-turbo in organization org-pJi85N3hjiXf3H4X0vI57QtL on tokens per min (TPM): Limit 80000, Used 79883, Requested 1075. Please try again in 718ms.

Since it appeared there were token limitation issues, I tried to cut down my texts to only the headlines of the articles. The procedure then worked, running through the distillation, clustering and synthesis stages. I receive the 5 concepts:

However, upon proceeding with the scoring, the procedure gets stuck:

Again there are token limit errors. It appears that the multi_query_gpt_wrapper works for the distillation, but not for the scoring.

michelle123lam commented 2 months ago

Hi @DrorMarkus, thanks for raising this! Sorry that this got in the way of your workflow. I'll look into this and see if we can gracefully handle rate limits for the scoring. As a workaround in the meantime, it might be helpful to use the batch_size argument in the call to the score() function (code here), like this:

# Current default batch_size=1
score_df = await l.score(batch_size=5)

This will batch together multiple text examples into a prompt, which could reduce the number of independent calls to the OpenAI API. The downside is that, from what I've seen in my testing, the scoring accuracy sometimes can suffer when multiple examples are batched together. I plan to add more info on additional arguments like this in the documentation.

As another update re: your first line, last night I updated the package to depend on python 3.10 and to use an updated OpenAI version >=1.23.1! Let me know if that's helpful for your setup. It should be available as version 0.7.1 of text_lloom.

zilinskyjan commented 2 months ago

Thank you @michelle123lam! Regarding your issue @DrorMarkus, I exceeded gpt-3.5-turbo limitations even prior to scoring (the first, i.e. distilling step). And I should specify that it's the TPM limit, not RPM, that's the current block. My number of rows isn't huge (N < 2000), each containing a few sentences.

Looking for a solution now (other than using gpt-4 with more generous limits, though that partly depends on things like the tier of your account).

michelle123lam commented 2 months ago

@DrorMarkus @zilinskyjan As an update, I've added functionality to the lloom instance creation so that users can (1) specify which models are used for the operators and (2) specify custom rate-limit parameters! These changes are available in text_lloom version 0.7.2.

Specifying which models are used:

l = wb.lloom(
    df=df,
    text_col="text",

    # Model specification
    distill_model_name = "gpt-3.5-turbo",
    embed_model_name = "text-embedding-3-large",
    synth_model_name = "gpt-4-turbo",
    score_model_name = "gpt-3.5-turbo",
)

Specifying custom rate-limit parameters:

l = wb.lloom(
    df=df,
    text_col="text",

    # Rate limit parameters dict
    # "model-name": (n_requests, wait_time_secs)
    rate_limits={
        "gpt-4-turbo": (40, 10),  # Specify any custom parameters, otherwise they default to the settings in llm.py
    }
)

n_requests: number of requests allowed in one batch
wait_time_secs: time period (in seconds) to wait before making more requests
RPM (Requests per minute) = n_requests* (60 / wait_time_secs)

I'll be adding in information about this in the documentation as well.

michelle123lam / lloom

Token Limit - Error Rate limit reached for gpt-3.5-turbo #3