ValueError: Error raised by inference API: Input is too long for this model, shorten your input or use 'parameters': {'truncation': 'only_first'} to run the model only on the first part. when using gpt2 from huggingface

aashay96 commented 1 year ago

I run into the following error when using gpt2 from huggingface -

ValueError: Error raised by inference API: Input is too long for this model, shorten your input or use 'parameters': {'truncation': 'only_first'} to run the model only on the first part.

Can the index not be build chunk by chunk? Or am I missing something?

jerryjliu commented 1 year ago

Hi @aashay96, by default the chunk limit is based off davinci (4096 tokens). For other LLM's, at the moment, you have a few options:

1) you can manually specify chunk_size_limit when building the index, to split the text chunks in a way that will fit the prompt limitation. Note: as a rule of thumb you should set chunk size = the maximum input limit of the LLM - 200.

For instance,

index = GPTListIndex(documents, chunk_size_limit=256)

I have a TODO to automatically infer the chunk size limit depending on the LLM that you are using!

2) You can manually define a PromptHelper (this is not exposed at all in the docs right now, so I'm just giving you a code example - I'll leave a TODO!). You set max_input_size to the maximum input limit of the LLM.

from gpt_index.indices.prompt_helper import PromptHelper

....

max_input_size = <the maximum input size>
num_output = 256
max_chunk_overlap = 0
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

# pass in prompt helper into index during construction
index = GPTListIndex(documents, prompt_helper=prompt_helper)

Let me know if either of these options works for you

aashay96 commented 1 year ago

Will try it out, thanks!

jerryjliu commented 1 year ago

@aashay96 was this still an issue?

jerryjliu commented 1 year ago

going to close for now

run-llama / llama_index

ValueError: Error raised by inference API: Input is too long for this model, shorten your input or use 'parameters': {'truncation': 'only_first'} to run the model only on the first part. when using gpt2 from huggingface #145