[Bug]: RAFT dataset generation stops if context is too big

BayramAnnakov commented 4 months ago

Bug Description

Name: llama-index-packs-raft-dataset Version: 0.1.4

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 10428 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

Version

0.1.4

Steps to Reproduce

Supply large context

Relevant Logs/Tracbacks

File "/usr/local/lib/python3.11/site-packages/llama_index/packs/raft_dataset/base.py", line 210, in run
    self.add_chunk_to_dataset(
  File "/usr/local/lib/python3.11/site-packages/llama_index/packs/raft_dataset/base.py", line 140, in add_chunk_to_dataset
    qs = self.generate_instructions_gen(chunk, x)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/packs/raft_dataset/base.py", line 109, in generate_instructions_gen
    queries = str(self.llm.chat(messages)).split("\n")
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 144, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/llms/openai/base.py", line 296, in chat
    return chat_fn(messages, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/llms/openai/base.py", line 362, in _chat
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 667, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1208, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 897, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 988, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 10428 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

dosubot[bot] commented 4 months ago

To address the issue where the RAFT dataset generation stops due to the context exceeding the OpenAI API's maximum token limit of 8192, you can modify the generate_label method in the RAFTDatasetPack class to handle large contexts more effectively. This involves splitting the context into smaller chunks that are within the token limit and querying the model with each chunk. The responses can then be aggregated to form a final answer.

Here's a strategy for implementing this:

Determine an appropriate chunk size that, when combined with your question, does not exceed the maximum token limit. A starting point could be 2048 tokens, but adjust based on your needs.
Split the context into chunks of this size.
For each chunk, create a combined text of the question and the chunk, ensuring it doesn't exceed the maximum token limit.
Query the model with each chunk and aggregate the responses.

This approach requires careful consideration of how to split the context and aggregate responses to maintain coherence and relevance in the final answer. It's a basic strategy that may need refinement based on the specifics of your application.

This solution is tailored to the code structure and error encountered in the llama-index-packs-raft-dataset version 0.1.4, specifically within the generate_label method. For further insights or related solutions, reviewing similar issues and resolutions in the repository's issue tracker and pull requests might provide additional context or alternative approaches.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

BayramAnnakov commented 4 months ago

As a workaround - use gpt-4-turbo model when initializing the RAFTDatasetPack (it uses gpt4 by default) since it has much bigger context window

run-llama / llama_index