Unable to query `ai-embed-qa-4`; uninformative error

aishwaryap commented 6 months ago

I am trying to experiment with different embedding models in a RAG application building off of the example here. It works fine when I create an NVIDIAEmbeddings object with model="nvolveqa_40k" but with model="ai-embed-qa-4" it fails at the vectorestore creation step ie

vectorstore = FAISS.from_documents(documents, document_embedder)

with the following uninformative error:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[6], line 7
      5 # document_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")
      6 document_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
----> 7 vectorstore = FAISS.from_documents(documents, document_embedder)
      8 retriever = vectorstore.as_retriever()
     10 user_input = "How do I query NVIDIA models in LangChain?"

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py:550](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py#line=549), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    548 texts = [d.page_content for d in documents]
    549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:930](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py#line=929), in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
    903 @classmethod
    904 def from_texts(
    905     cls,
   (...)
    910     **kwargs: Any,
    911 ) -> FAISS:
    912     """Construct FAISS wrapper from raw documents.
    913 
    914     This is a user friendly interface that:
   (...)
    928             faiss = FAISS.from_texts(texts, embeddings)
    929     """
--> 930     embeddings = embedding.embed_documents(texts)
    931     return cls.__from(
    932         texts,
    933         embeddings,
   (...)
    937         **kwargs,
    938     )

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:142](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=141), in NVIDIAEmbeddings.embed_documents(self, texts)
    136     batch = texts[i : i + self.max_batch_size]
    137     truncated = [
    138         text[: self.max_length] if len(text) > self.max_length else text
    139         for text in batch
    140     ]
    141     all_embeddings.extend(
--> 142         self._embed(truncated, model_type=self.model_type or "passage")
    143     )
    144 return all_embeddings

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:105](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=104), in NVIDIAEmbeddings._embed(self, texts, model_type)
    102     if self.truncate:
    103         payload["truncate"] = self.truncate
--> 105 response = self.client.get_req(
    106     model_name=self.model,
    107     payload=payload,
    108     endpoint="infer",
    109 )
    110 response.raise_for_status()
    111 result = response.json()

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:392](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=391), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
    390 if payload.get("stream", False) is True:
    391     payload = {**payload, "stream": False}
--> 392 response, session = self._post(invoke_url, payload)
    393 return self._wait(response, session)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:220](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=219), in NVEModel._post(self, invoke_url, payload)
    218 session = self.get_session_fn()
    219 self.last_response = response = session.post(**self.last_inputs)
--> 220 self._try_raise(response)
    221 return response, session

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:303](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=302), in NVEModel._try_raise(self, response)
    301     body += "\nPlease check or regenerate your API key."
    302 # todo: raise as an HTTPError
--> 303 raise Exception(f"{header}\n{body}") from None

Exception: [400] Bad Request
Inference error
RequestID: e7d48fb8-0a64-49d3-8cf3-a0c9ddbddbb4

I had noticed that for generation models this error sometimes simply means that a newer package version is required and I have filed an issue requesting for more informative errors in that case but with this model, I get this error even with the latest version (0.0.9) and a newly generated API key.

If the model is not yet supported, can it be hidden from the output of available_models?

mattf commented 6 months ago

If the model is not yet supported, can it be hidden from the output of available_models?

this is a good suggestion, also mentioned in https://github.com/langchain-ai/langchain-nvidia/issues/26 for chat completion models

mattf commented 6 months ago

@aishwaryap is it possible one of the documents you are sending to FAISS is empty? the service rejects empty content.

aishwaryap commented 6 months ago

@mattf I'm reasonably sure they are not. I took a working example with the nvolveqa_40k model and replaced just the embedding model with ai-embed-qa-4 to get this issue. I assume empty documents would error out with any model.

That said I can create a self complete example and add it for testing.

aishwaryap commented 6 months ago

Sample self contained script:

import os
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

os.environ["NVIDIA_API_KEY"] = "nvapi-<redacted>"

urls = ["https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/"]
loader = AsyncHtmlLoader(urls)
docs = loader.load()
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs_transformed)
print("Number of chunks: ", len(chunks))
print("----------------------")
print("Chunk text at index 2: \n")
print(chunks[1].page_content)
print("----------------------")
nvolveqa_embedder = NVIDIAEmbeddings(model="nvolveqa_40k")
nvolveqa_vectorstore = FAISS.from_documents(chunks, nvolveqa_embedder)
nvolveqa_retriever = nvolveqa_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
nvolveqa_retrieved = nvolveqa_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from nvolveqa-40k: \n")
print(nvolveqa_retrieved[0].page_content)
print("----------------------")
embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
embedqa4_retriever = embedqa4_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
embedqa4_retrieved = embedqa4_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from ai-embed-qa-4: \n")
print(embedqa4_retrieved[0].page_content)
print("----------------------")

My output (stderr + stdout):

Fetching pages: 100%|###########################################################################################| 1[/1](http://localhost:8888/1) [00:00<00:00, 27.39it[/s](http://localhost:8888/s)]
Created a chunk of size 2313, which is longer than the specified 2000
Number of chunks:  22
----------------------
Chunk text at index 2: 

* Graphs

    * Callbacks

    * Chat loaders

    * Adapters

    * Stores

  *   * Components
  * Chat models
  * NVIDIA AI Foundation Endpoints

On this page

# NVIDIA AI Foundation Endpoints

The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.

> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.

This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.

## Installation

    %pip install --upgrade --quiet langchain-nvidia-ai-endpoints  

    Note: you may need to restart the kernel to use updated packages.  

## Setup

**To get started:**

  1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.

  2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

  3. Select the `API` option and click `Generate Key`.

  4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
----------------------
Top retrieved chunk text from nvolveqa-40k: 

* Graphs

    * Callbacks

    * Chat loaders

    * Adapters

    * Stores

  *   * Components
  * Chat models
  * NVIDIA AI Foundation Endpoints

On this page

# NVIDIA AI Foundation Endpoints

The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.

> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.

This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.

## Installation

    %pip install --upgrade --quiet langchain-nvidia-ai-endpoints  

    Note: you may need to restart the kernel to use updated packages.  

## Setup

**To get started:**

  1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.

  2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

  3. Select the `API` option and click `Generate Key`.

  4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[1], line 32
     30 print("----------------------")
     31 embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
---> 32 embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
     33 embedqa4_retriever = embedqa4_vectorstore.as_retriever()
     34 user_input = "How do I query NVIDIA models in LangChain?"

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py:550](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py#line=549), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    548 texts = [d.page_content for d in documents]
    549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:930](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py#line=929), in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
    903 @classmethod
    904 def from_texts(
    905     cls,
   (...)
    910     **kwargs: Any,
    911 ) -> FAISS:
    912     """Construct FAISS wrapper from raw documents.
    913 
    914     This is a user friendly interface that:
   (...)
    928             faiss = FAISS.from_texts(texts, embeddings)
    929     """
--> 930     embeddings = embedding.embed_documents(texts)
    931     return cls.__from(
    932         texts,
    933         embeddings,
   (...)
    937         **kwargs,
    938     )

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:142](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=141), in NVIDIAEmbeddings.embed_documents(self, texts)
    136     batch = texts[i : i + self.max_batch_size]
    137     truncated = [
    138         text[: self.max_length] if len(text) > self.max_length else text
    139         for text in batch
    140     ]
    141     all_embeddings.extend(
--> 142         self._embed(truncated, model_type=self.model_type or "passage")
    143     )
    144 return all_embeddings

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:105](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=104), in NVIDIAEmbeddings._embed(self, texts, model_type)
    102     if self.truncate:
    103         payload["truncate"] = self.truncate
--> 105 response = self.client.get_req(
    106     model_name=self.model,
    107     payload=payload,
    108     endpoint="infer",
    109 )
    110 response.raise_for_status()
    111 result = response.json()

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:392](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=391), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
    390 if payload.get("stream", False) is True:
    391     payload = {**payload, "stream": False}
--> 392 response, session = self._post(invoke_url, payload)
    393 return self._wait(response, session)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:220](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=219), in NVEModel._post(self, invoke_url, payload)
    218 session = self.get_session_fn()
    219 self.last_response = response = session.post(**self.last_inputs)
--> 220 self._try_raise(response)
    221 return response, session

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:303](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=302), in NVEModel._try_raise(self, response)
    301     body += "\nPlease check or regenerate your API key."
    302 # todo: raise as an HTTPError
--> 303 raise Exception(f"{header}\n{body}") from None

Exception: [400] Bad Request
Inference error
RequestID: 11a69c13-e43c-4f0e-960d-98f4a3f4f706

Also verified using pip show that I am on version 0.0.9 (langchain-nvidia-ai-endpoints does not have a __version__ attribute to verify in code (issue))

(nvaif_env) ➜  ~ pip show langchain-nvidia-ai-endpoints
Name: langchain-nvidia-ai-endpoints
Version: 0.0.9
Summary: An integration package connecting NVIDIA AI Endpoints and LangChain
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Location: <redacted>
Requires: aiohttp, langchain-core, pillow
Required-by:

mattf commented 5 months ago

@aishwaryap thank you for the reproducer, it helped me narrow this down.

i believe the issue is some of the inputs are longer than the embedding model allows. in this case you can pass truncate="END", e.g. NVIDIAEmbeddings(model="ai-embed-qa-4", truncate="END")

this is not an issue w/ the nvolveqa_40k model because it would silently truncate your input, while the new models reject the input by default.

does that resolve your issue?

apolo74 commented 4 months ago

Hi @mattf, I just found this thread and I just wanted to say that your suggestion worked... at least for the issue at vectorstore = FAISS.from_documents(documents, document_embedder) so thanks for that information. However, I wanted to ask a follow-up question since a new Error arises when working with a chain and the vectorstore retriever. I'm trying to ingest scientific articles in PDF format and after passing the following chain:

retriever = vectorstore.as_retriever()
prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Answer solely based on the following context:\n<Documents>\n{context}\n</Documents>",
            ),
            ("user", "{question}"),
        ]
    )

chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
response = chain.invoke( "A question for the PDF")

I get the following error: .venv\Lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 311, in _try_raise raise Exception(f"{header}\n{body}") from None Exception: [500] Internal Server Error Input value error: prompt is [[5201]] long while only 2048 is supported Any ideas how to solve this part? Thanks in advance for any help you may provide!

mattf commented 4 months ago

@apolo74 please open this as a new issue, it appears unrelated to embedding and has an informative error

apolo74 commented 4 months ago

Hi again @mattf, a couple of minutes ago solved this... I was using a small model (microsoft/phi-3-mini-4k-instruct). There were no more errors the moment I switch to larger models. So the error was related to the size of the LLM. Thanks for your help!

mattf commented 4 months ago

@aishwaryap recent changes server-side should have fully resolved this. please reopen this if you still have an issue.

langchain-ai / langchain-nvidia

Unable to query `ai-embed-qa-4`; uninformative error #30