OpenAI API "maximum context length" errors

mariusnita commented 1 year ago

Seeing a bunch of errors coming back from the OpenAI API:

This model's maximum context length is 4097 tokens, however you requested 4303 tokens (4047 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

I'm just doing

index = gpt_index.GPTTreeIndex(docs)

With a list of docs I created manually, which are just files.

I construct the docs like this:

gpt_index.Document(
  contents, 
  extra_info=dict(filename=f, path=path)
)

In [2]: gpt_index.__version__
Out[2]: '0.2.7'

jerryjliu commented 1 year ago

thanks @mariusnita, sorry about the issue. i'll take a look soon. Is this only with the tree index? Does it work for simple vector index?

mariusnita commented 1 year ago

I just tried, and I don't get any errors with the vector or list indexes.

BTW, I just noticed the vector index is 20x cheaper and faster to create, and seems to have much better question-answering performance than the tree index. (Although I was only able to create a partial tree index by discarding the failing chunks, so that may explain the poor performance.)

jerryjliu commented 1 year ago

BTW, I just noticed the vector index is 20x cheaper and faster to create, and seems to have much better question-answering performance than the tree index. (Although I was only able to create a partial tree index by discarding the failing chunks, so that may explain the poor performance.)

Yeah it's a fair point, that's why the SimpleVectorIndex is the default mode in the quickstart :)

i've found the tree index to be more effective at 1) summarization (through construction of the tree itself), and decently ok at 2) routing, though of course embeddings can be used for (2) as well

mariusnita commented 1 year ago

Seeing the same error when using the code-davinci-002 model with the vector index:

llm_predictor = gpt_index.LLMPredictor(
    llm=langchain.OpenAI(
        temperature=0,
        model_name="code-davinci-002"
    )
)
index = gpt_index.GPTSimpleVectorIndex(
    docs,
    llm_predictor=llm_predictor
)

openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 9549 tokens (9549 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

jerryjliu commented 1 year ago

@mariusnita do you have sample data to help me repro by any chance? Feel free to DM me in the Discord

stanakaj commented 1 year ago

I had the same error with GPTSimpleVectorIndex and I was able to get around it by setting prompt_helper. for your information.

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4181 tokens (3925 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

max_input_size = 4096
num_output = 2000 
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

index = GPTSimpleVectorIndex.load_from_disk(
    'index.json', prompt_helper=prompt_helper
)

jerryjliu commented 1 year ago

Thanks @stanakaj. Yeah given the max input size this should be something gpt index handles under the hood, i'd be curious to see what the data is

mariusnita commented 1 year ago

@jerryjliu This is likely a bad example because it's probably not useful to feed SVGs into gpt-index; nonetheless this causes gpt-index to crash:

https://www.roojs.org/roojs1/fonts/nunito/nunito-v16-latin-italic.svg

Example program:

filename = "nunito-v16-latin-italic.svg"

with open(filename) as f:
    contents = f.read()

docs = [gpt_index.Document(contents)]

llm_predictor = gpt_index.LLMPredictor(
    llm=langchain.OpenAI(temperature=0, model_name="code-davinci-002")
)
index = gpt_index.GPTSimpleVectorIndex(docs, llm_predictor=llm_predictor)

mariusnita commented 1 year ago

The same file causes GPTTreeIndex to fail:

filename = "nunito-v16-latin-italic.svg"

with open(filename) as f:
    contents = f.read()

index = gpt_index.GPTTreeIndex(
    [gpt_index.Document(contents)],
)

jerryjliu commented 1 year ago

thanks @mariusnita, taking a look now

jerryjliu commented 1 year ago

Hi @mariusnita, just a quick note. The PR I linked partially fixes the issue but does not completely fix the issue around your use case. This is because at the moment, there's no way to appropriately pre-compute # tokens used for text-embedding-ada-002 (the tokenizer i use is not aligned with the tokenizer in openai): https://help.openai.com/en/articles/6824809-embeddings-frequently-asked-questions.

In the meantime, for your specific use case, can you manually set chunk_size_limit=4096 (a smaller number)? e.g.

index = GPTSimpleVectorIndex(docs, llm_predictor=llm_predictor, chunk_size_limit=4096)

stanakaj commented 1 year ago

Thank you. #266 fixes also my reported error.

run-llama / llama_index

OpenAI API "maximum context length" errors #253