run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.74k stars 5.27k forks source link

Token indices sequence length is longer than the specified maximum sequence length for this model (3793 > 1024). Running this sequence through the model will result in indexing errors #987

Closed claysauruswrecks closed 1 year ago

claysauruswrecks commented 1 year ago

Initially I thought the error was due to the loader not splitting chunks, but I'm still getting the mentioned error after adding a splitter. Maybe it's coming from OpenAI's API?

Bugfix branch: https://github.com/claysauruswrecks/llama-hub/tree/bugfix/github-repo-splitter

import pickle
import os
import logging
from llama_index import GPTSimpleVectorIndex

assert (
    os.getenv("OPENAI_API_KEY") is not None
), "Please set the OPENAI_API_KEY environment variable."

from llama_index import download_loader

logging.basicConfig(level=logging.DEBUG)

LLAMA_HUB_CONTENTS_URL = "https://raw.githubusercontent.com/claysauruswrecks/llama-hub/bugfix/github-repo-splitter"
LOADER_HUB_PATH = "/loader_hub"
LOADER_HUB_URL = LLAMA_HUB_CONTENTS_URL + LOADER_HUB_PATH

download_loader(
    "GithubRepositoryReader", loader_hub_url=LOADER_HUB_URL, refresh_cache=True
)

from llama_index.readers.llamahub_modules.github_repo import (
    GithubClient,
    GithubRepositoryReader,
)

docs = None

if os.path.exists("docs.pkl"):
    with open("docs.pkl", "rb") as f:
        docs = pickle.load(f)

if docs is None:
    github_client = GithubClient(os.getenv("GITHUB_TOKEN"))
    loader = GithubRepositoryReader(
        github_client,
        owner="jerryjliu",
        repo="llama_index",
        filter_directories=(
            ["gpt_index", "docs"],
            GithubRepositoryReader.FilterType.INCLUDE,
        ),
        filter_file_extensions=([".py"], GithubRepositoryReader.FilterType.INCLUDE),
        verbose=True,
        concurrent_requests=10,
    )

    docs = loader.load_data(commit_sha="1b739e1fcd525f73af4a7131dd52c7750e9ca247")

    with open("docs.pkl", "wb") as f:
        pickle.dump(docs, f)

index = GPTSimpleVectorIndex.from_documents(docs)

index.query("Explain each LlamaIndex class?")
claysauruswrecks commented 1 year ago

It appears I might be able to address this by using the PromptHelper to split after the loader's execution.

From Kapa.ai


Here's an example of how to set up a PromptHelper with custom parameters:

from llama_index import PromptHelper

# Set maximum input size
max_input_size = 1024
# Set number of output tokens
num_output = 256
# Set maximum chunk overlap
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

Then, you can create a ServiceContext with the PromptHelper:

from llama_index import ServiceContext, LLMPredictor
from langchain import OpenAI

# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Finally, you can build your index with the service_context:

from llama_index import GPTSimpleVectorIndex
from your_data_loading_module import documents

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

By using the PromptHelper with the appropriate parameters, you can ensure that the input text does not exceed the model's maximum token limit and avoid the indexing errors.

For more information, refer to the PromptHelper documentation (https://gpt-index.readthedocs.io/en/latest/reference/service_context/prompt_helper.html).
jerryjliu commented 1 year ago

@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.

Just do

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

does that work for you?

claysauruswrecks commented 1 year ago

@jerryjliu - Excellent, yes. I also now see the notebook examples. I will open a PR to clarify in the docs.

karottc commented 1 year ago

@jerryjliu

However, after setting it up like this, the response to response = index.query("query something") has also become shorter, losing information.

jerryjliu commented 1 year ago

by default similarity_top_k=1, you can increase similarity_top_k in index.query call

bisonliao commented 1 year ago

Is it possible to process documents with 2000 text files each has 5000 words?
I want use LLaMA-index to process my website doc, then create a smart assistant.

pramitchoudhary commented 1 year ago
# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)

Any concern about not exposing other params of Prompt Helper via ServiceContext.from_defaults? especially max_chunk_overlap

Shane-Khong commented 1 year ago
# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)

Any concern about not exposing other params of Prompt Helper via ServiceContext.from_defaults? especially max_chunk_overlap

I have a similar question, so hopefully not repeating here: does [directly inputting chunk_size_limit=512 parameter into service_context] do the same thing as [setting chunk_size_limit=512 in prompt_helper, and then inputting prompt_helper as paramater into service_context]?

Shane-Khong commented 1 year ago

Also, will setting chunk_size_limit = 512 result in a better outcome than chunk_size_limit = 2000 when summarising 280 page document?

dxiaosa commented 1 year ago

@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.

Just do

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

does that work for you?

Hello, "text-davinci-003" model can get 4,097 tokens at most, I just wonder why we still have the problem "Token indices sequence length is longer than the specified maximum sequence length for this model (2503 > 1024)."?

Majidbadal commented 1 year ago

This issue is about max output tokens I believe and not the input tokens

dosubot[bot] commented 1 year ago

Hi, @claysauruswrecks! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you raised is related to a token indices sequence length being longer than the specified maximum sequence length for a model. You suspect that the error may be coming from OpenAI's API and have provided a bugfix branch for reference. There have been discussions about using PromptHelper or setting the chunk_size_limit in the ServiceContext to address the issue. Some users have also raised questions about the impact on response length and the possibility of processing large documents.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LlamaIndex repository!