gpt-4 and gpt-4-32k support

jma7889 commented 1 year ago

Does the llama_index support gpt-4 's 8k input or gpt-4-32k ' s 32 k input? I tried to used them but give me error such as

InvalidRequestError                       Traceback (most recent call last)
Cell In[37], line 5
      2 from langchain import OpenAI
      4 # index = GPTSimpleVectorIndex(documents, llm_predictor=selected_predictor)
----> 5 index = GPTTreeIndex.from_documents(documents, service_context=service_context)
      7 # save index to file
      8 index.storage_context.persist()

File [~/miniconda3/envs/rfp-annotation/lib/python3.10/site-packages/llama_index/indices/base.py:93](https://file+.vscode-resource.vscode-cdn.net/Users/jma/dev/airpunchai/llm-kb/llm-kb/src/gpt_index_poc/~/miniconda3/envs/rfp-annotation/lib/python3.10/site-packages/llama_index/indices/base.py:93), in BaseGPTIndex.from_documents(cls, documents, storage_context, service_context, **kwargs)
     89     docstore.set_document_hash(doc.get_doc_id(), doc.get_doc_hash())
     91 nodes = service_context.node_parser.get_nodes_from_documents(documents)
---> 93 return cls(
     94     nodes=nodes,
     95     storage_context=storage_context,
     96     service_context=service_context,
     97     **kwargs,
     98 )

File [~/miniconda3/envs/rfp-annotation/lib/python3.10/site-packages/llama_index/indices/tree/base.py:77](https://file+.vscode-resource.vscode-cdn.net/Users/jma/dev/airpunchai/llm-kb/llm-kb/src/gpt_index_poc/~/miniconda3/envs/rfp-annotation/lib/python3.10/site-packages/llama_index/indices/tree/base.py:77), in GPTTreeIndex.__init__(self, nodes, index_struct, service_context, summary_template, insert_prompt, num_children, build_tree, use_async, **kwargs)
     75 self.build_tree = build_tree
     76 self._use_async = use_async
---> 77 super().__init__(
     78     nodes=nodes,
     79     index_struct=index_struct,
...
    683         rbody, rcode, resp.data, rheaders, stream_error=stream_error
    684     )
    685 return resp

InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 6356 tokens. Please reduce the length of the messages.

my code to select the model:

gpt4_32_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-4-32k"))
gpt4_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-4"))

selected_predictor = gpt4_predictor

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
default_prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
gpt4_prompt_helper = PromptHelper(8191, num_output, max_chunk_overlap)
gpt4_32_prompt_helper = PromptHelper(32765, num_output, max_chunk_overlap)

selected_prompt_helper = gpt4_prompt_helper

service_context = ServiceContext.from_defaults(llm_predictor=gpt35_predictor, prompt_helper=selected_prompt_helper)

logan-markewich commented 1 year ago

In your sample code at the end, you are using the gpt35_predictor instead of gpt4

Also if you are loading an index from disk, make sure you pass the service context back in

jma7889 commented 1 year ago

Thanks, The key is service context need to be assigned again when loading an index from disk. It resolved the issue. The issue can be closed.

run-llama / llama_index

gpt-4 and gpt-4-32k support #3277