Providing `-1` to `max_tokens` while creating an OpenAI LLM using the `gpt-3.5-turbo-instruct` is failing

System Info

LangChain version: tested with both versions 0.0.294 and 0.0.295. Python Version: 3.10.10 running on Linux x86_64 (Ubuntu 22.04). Tested with OpenAI library, both versions 0.27.8 and 0.28.0.

Who can help?

@hwchase17 @agola11

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[x] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

Run the following snippet:

from langchain import OpenAI

llm = OpenAI(
  model="gpt-3.5-turbo-instruct",
  verbose=True,
  max_tokens=-1,
)
llm("I am ")

The following error is returned (just the relevant stack trace):

File .venv/lib/python3.10/site-packages/langchain/llms/base.py:831, in BaseLLM.__call__(self, prompt, stop, callbacks, tags, metadata, **kwargs)
    824 if not isinstance(prompt, str):
    825     raise ValueError(
    826         "Argument `prompt` is expected to be a string. Instead found "
    827         f"{type(prompt)}. If you want to run the LLM on multiple prompts, use "
    828         "`generate` instead."
    829     )
    830 return (
--> 831     self.generate(
    832         [prompt],
    833         stop=stop,
    834         callbacks=callbacks,
    835         tags=tags,
    836         metadata=metadata,
    837         **kwargs,
    838     )
    839     .generations[0][0]
    840     .text
    841 )

File .venv/lib/python3.10/site-packages/langchain/llms/base.py:627, in BaseLLM.generate(self, prompts, stop, callbacks, tags, metadata, **kwargs)
    618         raise ValueError(
    619             "Asked to cache, but no cache found at `langchain.cache`."
    620         )
    621     run_managers = [
    622         callback_manager.on_llm_start(
    623             dumpd(self), [prompt], invocation_params=params, options=options
    624         )[0]
    625         for callback_manager, prompt in zip(callback_managers, prompts)
    626     ]
--> 627     output = self._generate_helper(
    628         prompts, stop, run_managers, bool(new_arg_supported), **kwargs
    629     )
    630     return output
    631 if len(missing_prompts) > 0:

File .venv/lib/python3.10/site-packages/langchain/llms/base.py:529, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
    527     for run_manager in run_managers:
    528         run_manager.on_llm_error(e)
--> 529     raise e
    530 flattened_outputs = output.flatten()
    531 for manager, flattened_output in zip(run_managers, flattened_outputs):

File .venv/lib/python3.10/site-packages/langchain/llms/base.py:516, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
    506 def _generate_helper(
    507     self,
    508     prompts: List[str],
   (...)
    512     **kwargs: Any,
    513 ) -> LLMResult:
    514     try:
    515         output = (
--> 516             self._generate(
    517                 prompts,
    518                 stop=stop,
    519                 # TODO: support multiple run managers
    520                 run_manager=run_managers[0] if run_managers else None,
    521                 **kwargs,
    522             )
    523             if new_arg_supported
    524             else self._generate(prompts, stop=stop)
    525         )
    526     except BaseException as e:
    527         for run_manager in run_managers:

File .venv/lib/python3.10/site-packages/langchain/llms/openai.py:357, in BaseOpenAI._generate(self, prompts, stop, run_manager, **kwargs)
    355 params = self._invocation_params
    356 params = {**params, **kwargs}
--> 357 sub_prompts = self.get_sub_prompts(params, prompts, stop)
    358 choices = []
    359 token_usage: Dict[str, int] = {}

File .venv/lib/python3.10/site-packages/langchain/llms/openai.py:459, in BaseOpenAI.get_sub_prompts(self, params, prompts, stop)
    455     if len(prompts) != 1:
    456         raise ValueError(
    457             "max_tokens set to -1 not supported for multiple inputs."
    458         )
--> 459     params["max_tokens"] = self.max_tokens_for_prompt(prompts[0])
    460 sub_prompts = [
    461     prompts[i : i + self.batch_size]
    462     for i in range(0, len(prompts), self.batch_size)
    463 ]
    464 return sub_prompts

File .venv/lib/python3.10/site-packages/langchain/llms/openai.py:616, in BaseOpenAI.max_tokens_for_prompt(self, prompt)
    602 """Calculate the maximum number of tokens possible to generate for a prompt.
    603 
    604 Args:
   (...)
    613         max_tokens = openai.max_token_for_prompt("Tell me a joke.")
    614 """
    615 num_tokens = self.get_num_tokens(prompt)
--> 616 return self.max_context_size - num_tokens

File .venv/lib/python3.10/site-packages/langchain/llms/openai.py:599, in BaseOpenAI.max_context_size(self)
    596 @property
    597 def max_context_size(self) -> int:
    598     """Get max context size for this model."""
--> 599     return self.modelname_to_contextsize(self.model_name)

File .venv/lib/python3.10/site-packages/langchain/llms/openai.py:589, in BaseOpenAI.modelname_to_contextsize(modelname)
    586 context_size = model_token_mapping.get(modelname, None)
    588 if context_size is None:
--> 589     raise ValueError(
    590         f"Unknown model: {modelname}. Please provide a valid OpenAI model name."
    591         "Known models are: " + ", ".join(model_token_mapping.keys())
    592     )
    594 return context_size

ValueError: Unknown model: gpt-3.5-turbo-instruct. Please provide a valid OpenAI model name.Known models are: gpt-4, gpt-4-0314, gpt-4-0613, gpt-4-32k, gpt-4-32k-0314, gpt-4-32k-0613, gpt-3.5-turbo, gpt-3.5-turbo-0301, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k, gpt-3.5-turbo-16k-0613, text-ada-001, ada, text-babbage-001, babbage, text-curie-001, curie, davinci, text-davinci-003, text-davinci-002, code-davinci-002, code-davinci-001, code-cushman-002, code-cushman-001

Originally got this through a LLMChain, but tracked it down to the LLM itself.

Expected behavior

Should be processing the input correctly. Can confirm this works with other models like text-davinci-003.

🤖

Hello,

Thank you for bringing this to our attention. The issue you're experiencing is due to the fact that the OpenAI API does not support a negative value for max_tokens. In the OpenAIChat class, there is a specific check for max_tokens being -1 and in that case, it removes the max_tokens parameter from the request. This is because for the ChatGPT API, omitting max_tokens is equivalent to having no limit on the number of tokens. However, this check is not present in the OpenAI class, which is used for the gpt-3.5-turbo-instruct model. Therefore, when -1 is passed as max_tokens, it is sent directly to the OpenAI API, which does not support negative values for max_tokens, causing a failure.

In addition, the get_sub_prompts method in the BaseOpenAI class also checks if max_tokens is -1. If it is, and if there are multiple prompts, it raises a ValueError, as -1 is not supported for multiple inputs. If there is only one prompt, it sets max_tokens to the maximum tokens for the prompt.

I found a similar issue in the LangChain repository: Error when using gpt-3.5-turbo-instruct: model_token_mapping is missing an entry for gpt-3.5-turbo-instruct. This issue was closed by a pull request: misc: add gpt-3.5-turbo-instruct to model_token_mapping. It seems that the gpt-3.5-turbo-instruct model was not included in the model_token_mapping dictionary, which maps model names to their respective maximum context sizes. This could be the reason why you're seeing the ValueError: Unknown model: gpt-3.5-turbo-instruct.

As a workaround, you could try setting max_tokens to a positive integer instead of -1. This should allow you to use the gpt-3.5-turbo-instruct model without encountering this error. However, please note that this will limit the length of the generated output to the number of tokens specified.

We will look into adding a check for max_tokens being -1 in the OpenAI class and including gpt-3.5-turbo-instruct in the model_token_mapping dictionary in a future update. Thank you for your patience and understanding.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain