langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.26k stars 14.73k forks source link

Tiktoken version is too old for `gpt-3.5-turbo` #1881

Closed xingfanxia closed 1 year ago

xingfanxia commented 1 year ago
Traceback (most recent call last):
  File "/Users/xingfanxia/projects/notion-qa/qa.py", line 25, in <module>
    result = chain({"question": args.question})
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/base.py", line 116, in __call__
    raise e
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/qa_with_sources/base.py", line 118, in _call
    answer, _ = self.combine_documents_chain.combine_docs(docs, **inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py", line 143, in combine_docs
    return self._process_results(results, docs, token_max, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py", line 173, in _process_results
    num_tokens = length_func(result_docs, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 83, in prompt_length
    return self.llm_chain.llm.get_num_tokens(prompt)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 331, in get_num_tokens
    enc = tiktoken.encoding_for_model(self.model_name)
  File "/opt/homebrew/lib/python3.10/site-packages/tiktoken/model.py", line 51, in encoding_for_model
    raise KeyError(
KeyError: 'Could not automatically map gpt-3.5-turbo to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'
harithzulfaizal commented 1 year ago

I seem to be encountering the same issue when using gpt-4 despite having the latest version of Tiktoken. Any ideas as to why?

KeyError                                  Traceback (most recent call last)
Cell In[8], line 2, in answer(question)
      1 def answer(question):
----> 2     return chain({"question": question}, return_only_outputs=True)

File c:\Users\gpharith\Documents\langchain-policydoc\langchain4u\lib\site-packages\langchain\chains\base.py:116, in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File c:\Users\gpharith\Documents\langchain-policydoc\langchain4u\lib\site-packages\langchain\chains\base.py:113, in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
    112 try:
--> 113     outputs = self._call(inputs)
...
     70         "Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect."
     71     ) from None
     73 return get_encoding(encoding_name)

KeyError: 'Could not automatically map gpt-4 to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'
plchld commented 1 year ago

I get the same issue when i use AzureOpenAI gpt 3.5

KeyError                                  Traceback (most recent call last)
Cell In[69], line 22
     20 PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
     21 chain = load_summarize_chain(gpt35, chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
---> 22 chain({"input_documents": docs}, return_only_outputs=True)

File [~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:116](https://file+.vscode-resource.vscode-cdn.net/Users/nikolasmolyndris/OSINT/Projects/News-Copilot/~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:116), in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File [~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:113](https://file+.vscode-resource.vscode-cdn.net/Users/nikolasmolyndris/OSINT/Projects/News-Copilot/~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:113), in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
    112 try:
--> 113     outputs = self._call(inputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
...
     72         "Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect."
     73     ) from None
     75 return get_encoding(encoding_name)

KeyError: 'Could not automatically map gpt-35-turbo to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'
awhillas commented 1 year ago

Seems to be setup to handle the latest https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L13

SamOyeAH commented 1 year ago

Has anyone been able to solve this?

peterjhwang commented 1 year ago

I had the same issue. It works for me after updating tiktoken version.

sangeetkumar1988 commented 1 year ago

Hi Peter, I'm facing the same issue. Can you please let me know the tiktoken version you used to resolve the issue. As I have updated it to the latest version and langchain also has the updated version. But still it's giving the error

peterjhwang commented 1 year ago

At the moment, I am using. tiktoken==0.4.0 langchain==0.0.178 You can check the model you are using is included in MODEL_TO_ENCODING from here. https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

sangeetkumar1988 commented 1 year ago

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization?

sangeetkumar1988 commented 1 year ago

At the moment, I am using. tiktoken==0.4.0 langchain==0.0.178 You can check the model you are using is included in MODEL_TO_ENCODING from here. https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization or RetrievalQuestionAnswering kind of work?

sangeetkumar1988 commented 1 year ago

Any idea why chain_type='map_reduce' can't be used with custom prompt template. Like if we mention chain_type='map_reduce' the method doesn't accept prompt=PROMPT.

NageshMashette commented 11 months ago

At the moment, I am using. tiktoken==0.4.0 langchain==0.0.178 You can check the model you are using is included in MODEL_TO_ENCODING from here. https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization or RetrievalQuestionAnswering kind of work?

yes you can use.