More informative errors when newer package version is needed for querying specific models

aishwaryap commented 4 months ago

Hi all, I was experimenting with langchain-nvidia-ai-endpoints==0.0.4 and when I tried querying the ai-llama3-70b model, I got the following error:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[1], line 7
      5 # llm = ChatNVIDIA(model="mixtral_8x7b")
      6 llm = ChatNVIDIA(model="ai-llama3-70b")
----> 7 result = llm.invoke("How do I query NVIDIA models in LangChain?")
      8 print(result.content)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:158](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py#line=157), in BaseChatModel.invoke(self, input, config, stop, **kwargs)
    147 def invoke(
    148     self,
    149     input: LanguageModelInput,
   (...)
    153     **kwargs: Any,
    154 ) -> BaseMessage:
    155     config = ensure_config(config)
    156     return cast(
    157         ChatGeneration,
--> 158         self.generate_prompt(
    159             [self._convert_input(input)],
    160             stop=stop,
    161             callbacks=config.get("callbacks"),
    162             tags=config.get("tags"),
    163             metadata=config.get("metadata"),
    164             run_name=config.get("run_name"),
    165             run_id=config.pop("run_id", None),
    166             **kwargs,
    167         ).generations[0][0],
    168     ).message

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:560](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py#line=559), in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    552 def generate_prompt(
    553     self,
    554     prompts: List[PromptValue],
   (...)
    557     **kwargs: Any,
    558 ) -> LLMResult:
    559     prompt_messages = [p.to_messages() for p in prompts]
--> 560     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:421](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py#line=420), in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    419         if run_managers:
    420             run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 421         raise e
    422 flattened_outputs = [
    423     LLMResult(generations=[res.generations], llm_output=res.llm_output)  # type: ignore[list-item]
    424     for res in results
    425 ]
    426 llm_output = self._combine_llm_outputs([res.llm_output for res in results])

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:411](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py#line=410), in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    408 for i, m in enumerate(messages):
    409     try:
    410         results.append(
--> 411             self._generate_with_cache(
    412                 m,
    413                 stop=stop,
    414                 run_manager=run_managers[i] if run_managers else None,
    415                 **kwargs,
    416             )
    417         )
    418     except BaseException as e:
    419         if run_managers:

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:632](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py#line=631), in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
    630 else:
    631     if inspect.signature(self._generate).parameters.get("run_manager"):
--> 632         result = self._generate(
    633             messages, stop=stop, run_manager=run_manager, **kwargs
    634         )
    635     else:
    636         result = self._generate(messages, stop=stop, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py:155](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py#line=154), in ChatNVIDIA._generate(self, messages, stop, run_manager, **kwargs)
    148 def _generate(
    149     self,
    150     messages: List[BaseMessage],
   (...)
    153     **kwargs: Any,
    154 ) -> ChatResult:
--> 155     responses = self._call(messages, stop=stop, run_manager=run_manager, **kwargs)
    156     self._set_callback_out(responses, run_manager)
    157     message = ChatMessage(**self.custom_postprocess(responses))

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py:186](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py#line=185), in ChatNVIDIA._call(self, messages, stop, run_manager, **kwargs)
    184 """Invoke on a single list of chat messages."""
    185 inputs = self.custom_preprocess(messages)
--> 186 responses = self.get_generation(inputs=inputs, stop=stop, **kwargs)
    187 return responses

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py:310](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/chat_models.py#line=309), in ChatNVIDIA.get_generation(self, inputs, **kwargs)
    308 stop = kwargs["stop"] = kwargs.get("stop") or self.stop
    309 payload = self.get_payload(inputs=inputs, stream=False, **kwargs)
--> 310 out = self.client.get_req_generation(self.model, stop=stop, payload=payload)
    311 return out

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:387](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=386), in NVEModel.get_req_generation(self, model_name, payload, invoke_url, stop, endpoint)
    385 """Method for an end-to-end post query with NVE post-processing."""
    386 invoke_url = self._get_invoke_url(model_name, invoke_url, endpoint=endpoint)
--> 387 response = self.get_req(model_name, payload, invoke_url)
    388 output, _ = self.postprocess(response, stop=stop)
    389 return output

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:374](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=373), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
    372 if payload.get("stream", False) is True:
    373     payload = {**payload, "stream": False}
--> 374 response, session = self._post(invoke_url, payload)
    375 return self._wait(response, session)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:213](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=212), in NVEModel._post(self, invoke_url, payload)
    211 session = self.get_session_fn()
    212 self.last_response = response = session.post(**self.last_inputs)
--> 213 self._try_raise(response)
    214 return response, session

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:285](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=284), in NVEModel._try_raise(self, response)
    283 if str(status) == "401":
    284     body += "\nPlease check or regenerate your API key."
--> 285 raise Exception(f"{header}\n{body}") from None

Exception: [404] Not Found
Inference error
RequestID: d39bb278-1623-4561-a556-0b43547ab10e

It turned out that all I needed to do to query Llama 3 was to upgrade to 0.0.8. Is it intentional for newer models to be incompatible with older package versions, even minor ones? This is inconvenient when langchain-nvidia-ai-endpoints is a dependency inside another open source package which may not be staying up to date with the latest versions.

Additionally, is it possible for it to fail with a more informative error that would let the user know that they could query this model if they upgraded the package. An alternative would be for the Github README or Langchain documentation to be kept updated with the minimum package version required to query various models. It would also be nice if the output of llm.available_models could also show the minimum package version needed to query the model.

aishwaryap commented 4 months ago

Note that the trace includes a Please check or regenerate your API key but I actually did not need to do that. I only needed to upgrade my package to successfully query the newer models.

mattf commented 4 months ago

@aishwaryap thank you for reporting this. currently, supporting new models requires updating a table in the package with invocation information. we're working on automation for this.

as for the informative error, that is also tracked in https://github.com/langchain-ai/langchain-nvidia/issues/21

mattf commented 4 months ago

we will track the informative error request in #21

langchain-ai / langchain-nvidia

More informative errors when newer package version is needed for querying specific models #29