Closed claysauruswrecks closed 1 year ago
It appears I might be able to address this by using the PromptHelper
to split after the loader's execution.
From Kapa.ai
Here's an example of how to set up a PromptHelper with custom parameters:
from llama_index import PromptHelper
# Set maximum input size
max_input_size = 1024
# Set number of output tokens
num_output = 256
# Set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
Then, you can create a ServiceContext with the PromptHelper:
from llama_index import ServiceContext, LLMPredictor
from langchain import OpenAI
# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
Finally, you can build your index with the service_context:
from llama_index import GPTSimpleVectorIndex
from your_data_loading_module import documents
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
By using the PromptHelper with the appropriate parameters, you can ensure that the input text does not exceed the model's maximum token limit and avoid the indexing errors.
For more information, refer to the PromptHelper documentation (https://gpt-index.readthedocs.io/en/latest/reference/service_context/prompt_helper.html).
@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.
Just do
# NOTE: set a chunk size limit to < 1024 tokens
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
does that work for you?
@jerryjliu - Excellent, yes. I also now see the notebook examples. I will open a PR to clarify in the docs.
@jerryjliu
However, after setting it up like this, the response to response = index.query("query something")
has also become shorter, losing information.
by default similarity_top_k=1, you can increase similarity_top_k in index.query
call
Is it possible to process documents with 2000 text files each has 5000 words?
I want use LLaMA-index to process my website doc, then create a smart assistant.
# NOTE: set a chunk size limit to < 1024 tokens
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
Any concern about not exposing other params of Prompt Helper via ServiceContext.from_defaults
? especially max_chunk_overlap
# NOTE: set a chunk size limit to < 1024 tokens service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
Any concern about not exposing other params of Prompt Helper via
ServiceContext.from_defaults
? especiallymax_chunk_overlap
I have a similar question, so hopefully not repeating here: does [directly inputting chunk_size_limit=512 parameter into service_context] do the same thing as [setting chunk_size_limit=512 in prompt_helper, and then inputting prompt_helper as paramater into service_context]?
Also, will setting chunk_size_limit = 512 result in a better outcome than chunk_size_limit = 2000 when summarising 280 page document?
@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.
Just do
# NOTE: set a chunk size limit to < 1024 tokens service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512) index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
does that work for you?
Hello, "text-davinci-003" model can get 4,097 tokens at most, I just wonder why we still have the problem "Token indices sequence length is longer than the specified maximum sequence length for this model (2503 > 1024)."?
This issue is about max output tokens I believe and not the input tokens
Hi, @claysauruswrecks! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue you raised is related to a token indices sequence length being longer than the specified maximum sequence length for a model. You suspect that the error may be coming from OpenAI's API and have provided a bugfix branch for reference. There have been discussions about using PromptHelper or setting the chunk_size_limit in the ServiceContext to address the issue. Some users have also raised questions about the impact on response length and the possibility of processing large documents.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to the LlamaIndex repository!
Initially I thought the error was due to the loader not splitting chunks, but I'm still getting the mentioned error after adding a splitter. Maybe it's coming from OpenAI's API?
Bugfix branch: https://github.com/claysauruswrecks/llama-hub/tree/bugfix/github-repo-splitter