Closed aishwaryap closed 4 months ago
If the model is not yet supported, can it be hidden from the output of available_models?
this is a good suggestion, also mentioned in https://github.com/langchain-ai/langchain-nvidia/issues/26 for chat completion models
@aishwaryap is it possible one of the documents you are sending to FAISS is empty? the service rejects empty content.
@mattf I'm reasonably sure they are not. I took a working example with the nvolveqa_40k
model and replaced just the embedding model with ai-embed-qa-4
to get this issue. I assume empty documents would error out with any model.
That said I can create a self complete example and add it for testing.
Sample self contained script:
import os
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
os.environ["NVIDIA_API_KEY"] = "nvapi-<redacted>"
urls = ["https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/"]
loader = AsyncHtmlLoader(urls)
docs = loader.load()
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs_transformed)
print("Number of chunks: ", len(chunks))
print("----------------------")
print("Chunk text at index 2: \n")
print(chunks[1].page_content)
print("----------------------")
nvolveqa_embedder = NVIDIAEmbeddings(model="nvolveqa_40k")
nvolveqa_vectorstore = FAISS.from_documents(chunks, nvolveqa_embedder)
nvolveqa_retriever = nvolveqa_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
nvolveqa_retrieved = nvolveqa_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from nvolveqa-40k: \n")
print(nvolveqa_retrieved[0].page_content)
print("----------------------")
embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
embedqa4_retriever = embedqa4_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
embedqa4_retrieved = embedqa4_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from ai-embed-qa-4: \n")
print(embedqa4_retrieved[0].page_content)
print("----------------------")
My output (stderr + stdout):
Fetching pages: 100%|###########################################################################################| 1[/1](http://localhost:8888/1) [00:00<00:00, 27.39it[/s](http://localhost:8888/s)]
Created a chunk of size 2313, which is longer than the specified 2000
Number of chunks: 22
----------------------
Chunk text at index 2:
* Graphs
* Callbacks
* Chat loaders
* Adapters
* Stores
* * Components
* Chat models
* NVIDIA AI Foundation Endpoints
On this page
# NVIDIA AI Foundation Endpoints
The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.
> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.
This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.
## Installation
%pip install --upgrade --quiet langchain-nvidia-ai-endpoints
Note: you may need to restart the kernel to use updated packages.
## Setup
**To get started:**
1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.
2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.
3. Select the `API` option and click `Generate Key`.
4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
----------------------
Top retrieved chunk text from nvolveqa-40k:
* Graphs
* Callbacks
* Chat loaders
* Adapters
* Stores
* * Components
* Chat models
* NVIDIA AI Foundation Endpoints
On this page
# NVIDIA AI Foundation Endpoints
The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.
> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.
This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.
## Installation
%pip install --upgrade --quiet langchain-nvidia-ai-endpoints
Note: you may need to restart the kernel to use updated packages.
## Setup
**To get started:**
1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.
2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.
3. Select the `API` option and click `Generate Key`.
4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[1], line 32
30 print("----------------------")
31 embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
---> 32 embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
33 embedqa4_retriever = embedqa4_vectorstore.as_retriever()
34 user_input = "How do I query NVIDIA models in LangChain?"
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py:550](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py#line=549), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
548 texts = [d.page_content for d in documents]
549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:930](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py#line=929), in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
903 @classmethod
904 def from_texts(
905 cls,
(...)
910 **kwargs: Any,
911 ) -> FAISS:
912 """Construct FAISS wrapper from raw documents.
913
914 This is a user friendly interface that:
(...)
928 faiss = FAISS.from_texts(texts, embeddings)
929 """
--> 930 embeddings = embedding.embed_documents(texts)
931 return cls.__from(
932 texts,
933 embeddings,
(...)
937 **kwargs,
938 )
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:142](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=141), in NVIDIAEmbeddings.embed_documents(self, texts)
136 batch = texts[i : i + self.max_batch_size]
137 truncated = [
138 text[: self.max_length] if len(text) > self.max_length else text
139 for text in batch
140 ]
141 all_embeddings.extend(
--> 142 self._embed(truncated, model_type=self.model_type or "passage")
143 )
144 return all_embeddings
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:105](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=104), in NVIDIAEmbeddings._embed(self, texts, model_type)
102 if self.truncate:
103 payload["truncate"] = self.truncate
--> 105 response = self.client.get_req(
106 model_name=self.model,
107 payload=payload,
108 endpoint="infer",
109 )
110 response.raise_for_status()
111 result = response.json()
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:392](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=391), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
390 if payload.get("stream", False) is True:
391 payload = {**payload, "stream": False}
--> 392 response, session = self._post(invoke_url, payload)
393 return self._wait(response, session)
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:220](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=219), in NVEModel._post(self, invoke_url, payload)
218 session = self.get_session_fn()
219 self.last_response = response = session.post(**self.last_inputs)
--> 220 self._try_raise(response)
221 return response, session
File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:303](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=302), in NVEModel._try_raise(self, response)
301 body += "\nPlease check or regenerate your API key."
302 # todo: raise as an HTTPError
--> 303 raise Exception(f"{header}\n{body}") from None
Exception: [400] Bad Request
Inference error
RequestID: 11a69c13-e43c-4f0e-960d-98f4a3f4f706
Also verified using pip show
that I am on version 0.0.9 (langchain-nvidia-ai-endpoints
does not have a __version__
attribute to verify in code (issue))
(nvaif_env) ➜ ~ pip show langchain-nvidia-ai-endpoints
Name: langchain-nvidia-ai-endpoints
Version: 0.0.9
Summary: An integration package connecting NVIDIA AI Endpoints and LangChain
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Location: <redacted>
Requires: aiohttp, langchain-core, pillow
Required-by:
@aishwaryap thank you for the reproducer, it helped me narrow this down.
i believe the issue is some of the inputs are longer than the embedding model allows. in this case you can pass truncate="END"
, e.g. NVIDIAEmbeddings(model="ai-embed-qa-4", truncate="END")
this is not an issue w/ the nvolveqa_40k
model because it would silently truncate your input, while the new models reject the input by default.
does that resolve your issue?
Hi @mattf, I just found this thread and I just wanted to say that your suggestion worked... at least for the issue at
vectorstore = FAISS.from_documents(documents, document_embedder)
so thanks for that information. However, I wanted to ask a follow-up question since a new Error arises when working with a chain and the vectorstore retriever. I'm trying to ingest scientific articles in PDF format and after passing the following chain:
retriever = vectorstore.as_retriever()
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Answer solely based on the following context:\n<Documents>\n{context}\n</Documents>",
),
("user", "{question}"),
]
)
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
response = chain.invoke( "A question for the PDF")
I get the following error:
.venv\Lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 311, in _try_raise
raise Exception(f"{header}\n{body}") from None
Exception: [500] Internal Server Error
Any ideas how to solve this part? Thanks in advance for any help you may provide!
Input value error: prompt is [[5201]] long while only 2048 is supported
@apolo74 please open this as a new issue, it appears unrelated to embedding and has an informative error
Hi again @mattf, a couple of minutes ago solved this... I was using a small model (microsoft/phi-3-mini-4k-instruct). There were no more errors the moment I switch to larger models. So the error was related to the size of the LLM. Thanks for your help!
@aishwaryap recent changes server-side should have fully resolved this. please reopen this if you still have an issue.
I am trying to experiment with different embedding models in a RAG application building off of the example here. It works fine when I create an
NVIDIAEmbeddings
object withmodel="nvolveqa_40k"
but withmodel="ai-embed-qa-4"
it fails at the vectorestore creation step iewith the following uninformative error:
I had noticed that for generation models this error sometimes simply means that a newer package version is required and I have filed an issue requesting for more informative errors in that case but with this model, I get this error even with the latest version (0.0.9) and a newly generated API key.
If the model is not yet supported, can it be hidden from the output of
available_models
?