Closed nyanp closed 1 year ago
Answer generated by a 🤖
Source: langchain/chat_models/openai.py
The issue is caused by the get_num_tokens_from_messages
method not handling the case when the value is a dictionary, which happens when a function call is included in the response. To fix this issue, you can modify the method to handle dictionaries by recursively counting tokens in the nested values.
Here's a modified version of the get_num_tokens_from_messages
method that handles dictionaries:
def get_num_tokens_from_messages(self, messages):
def count_tokens(obj, encoding):
if isinstance(obj, str):
return len(encoding.encode(obj))
elif isinstance(obj, dict):
return sum(count_tokens(v, encoding) for v in obj.values())
else:
return 0
model, encoding = self._get_encoding_model()
if model.startswith("gpt-3.5-turbo"):
tokens_per_message = 4
tokens_per_name = -1
elif model.startswith("gpt-4"):
tokens_per_message = 3
tokens_per_name = 1
else:
raise NotImplementedError(
f"get_num_tokens_from_messages() is not presently implemented "
f"for model {model}."
"See https://github.com/openai/openai-python/blob/main/chatml.md for "
"information on how messages are converted to tokens."
)
num_tokens = 0
messages_dict = [_convert_message_to_dict(m) for m in messages]
for message in messages_dict:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += count_tokens(value, encoding)
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3
return num_tokens
Now, when you call llm.get_num_tokens_from_messages([response])
, it should handle the function call response correctly and not raise a TypeError
.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Hi, @nyanp! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is with the get_num_tokens_from_messages
function in the ChatOpenAI
class. It raises a TypeError
when a function call is included in the messages. The suggested solution is to modify the method to handle dictionaries by recursively counting tokens in the nested values. A modified version of the get_num_tokens_from_messages
method has been provided in the comments, which should resolve the issue. It's worth noting that two users, devstein
and hatopopvr
, have given a thumbs up reaction to the suggested solution, indicating that it is likely to be a resolution.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain repository! Let us know if you have any further questions or concerns.
System Info
Who can help?
No response
Information
Related Components
Reproduction
Executing following code raises
TypeError: expected string or buffer
.get_num_tokens_from_messages
internally converts messages to dict with_convert_message_to_dict
and then interates all key-value pairs to count the number of tokens. The code expects value to be a string, but when a function call is included, an exception seems to be raised because value contains a dictionary.Expected behavior
As far as I know, there is no officially documented way to calculate the exact token count consumption when using function call.
Someone on the OpenAI forum has posted a method for calculating the tokens, so perhaps that method could be adopted.