Closed vecorro closed 1 month ago
To resolve the HTTPError: 400 Client Error: Bad Request for url: http://10.237.119.242:8010/v1/ranking
issue when using the Nvidia NIM Reranker in your Python code, ensure that the URL and payload format match the expected format for the on-premises NIM service.
Here is a checklist and example code to help you troubleshoot and resolve the issue:
/v1/ranking
.Authorization
header if required.Here is an example of how you can structure your request:
import requests
class NVIDIARerank:
def __init__(self, base_url, api_key=None):
self._base_url = base_url
self._api_key = api_key
def rerank(self, query_str, nodes):
session = requests.Session()
headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
}
payload = {
"model": "nv-rerank-qa-mistral-4b:1",
"query": {"text": query_str},
"passages": [{"text": node} for node in nodes],
}
url = f"{self._base_url}/ranking"
response = session.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
# Example usage
base_url = "http://10.237.119.242:8010/v1"
api_key = "your_api_key" # If required
reranker = NVIDIARerank(base_url, api_key)
query_str = "which way should i go?"
nodes = [
"two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;",
"then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,",
"and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back.",
"i shall be telling this with a sigh somewhere ages and ages hence: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."
]
try:
result = reranker.rerank(query_str, nodes)
print(result)
except requests.HTTPError as e:
print(f"HTTP error occurred: {e}")
Ensure that:
base_url
is correctly set to your on-premises NIM service.Authorization
header is included if your service requires an API key.If the service works correctly with a curl command, compare the curl request with the Python request to ensure they are identical in terms of URL, headers, and payload.
Additionally, you can refer to the NVIDIA NIMs documentation for more details on setting up and troubleshooting local microservice instances. This documentation provides comprehensive guidance on working with NVIDIA NIMs, both hosted and local instances [1][2].
To continue talking to Dosu, mention @dosu.
The documentation dosubot is recommending is wrong, for instance this cant be right:
from llama_index.llms.nvidia import NVIDIA
# connect to a rerank NIM running at localhost:1976
reranker = NVIDIARerank(base_url="http://localhost:1976/v1")
The LlamaIndex documentation about the NIM class needs revision, perhaps the implementation of that module also requires corrections.
To address the issues with the Nvidia NIM Reranker and ensure the documentation and implementation are correct, let's focus on the key points:
NVIDIARerank
ClassInitialization:
Ensure you initialize the NVIDIARerank
class with the correct parameters, including the API key and base URL if needed.
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
reranker = NVIDIARerank(
model="nv-rerank-qa-mistral-4b:1",
nvidia_api_key="your_api_key",
base_url="http://10.237.119.242:8010/v1",
top_n=5,
max_batch_size=64
)
Payload Structure: The payload structure expected by the Nvidia NIM Reranker API should be as follows:
{
"model": "nv-rerank-qa-mistral-4b:1",
"query": {
"text": "which way should i go?"
},
"passages": [
{
"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"
},
{
"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"
},
{
"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."
},
{
"text": "i shall be telling this with a sigh somewhere ages and ages hence: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."
}
]
}
Example Code:
Here is an example of how to use the NVIDIARerank
class correctly:
import requests
class NVIDIARerank:
def __init__(self, base_url, api_key=None):
self._base_url = base_url
self._api_key = api_key
def rerank(self, query_str, nodes):
session = requests.Session()
headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
}
payload = {
"model": "nv-rerank-qa-mistral-4b:1",
"query": {"text": query_str},
"passages": [{"text": node} for node in nodes],
}
url = f"{self._base_url}/ranking"
response = session.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
# Example usage
base_url = "http://10.237.119.242:8010/v1"
api_key = "your_api_key" # If required
reranker = NVIDIARerank(base_url, api_key)
query_str = "which way should i go?"
nodes = [
"two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;",
"then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,",
"and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back.",
"i shall be telling this with a sigh somewhere ages and ages hence: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."
]
try:
result = reranker.rerank(query_str, nodes)
print(result)
except requests.HTTPError as e:
print(f"HTTP error occurred: {e}")
The NVIDIARerank
class should be well-documented, including detailed descriptions of its parameters and methods. The implementation should include validation and error handling to ensure correct usage.
By following these guidelines and ensuring the correct setup, you should be able to resolve the HTTPError: 400 Client Error: Bad Request
issue and use the NVIDIARerank
class effectively [1][2][3].
It looks like we need your specialized knowledge to address the documentation concerns and the implementation details, @logan-markewich.
To continue talking to Dosu, mention @dosu.
There must be something wrong with either the NIM client implementation or the documentation I have double-checked my code and all looks the way @dosu indicates"
re_ranker = NVIDIARerank(
model="nvidia/nv-rerankqa-mistral-4b-v3", # rerank_cfg.model,
base_url="http://10.237.119.242:8010/v1",# rerank_cfg.api_base,
top_n=4 # rerank_cfg.rerank_top_n,
)
and I still get the same error.
The HTTPError: 400 Client Error: Bad Request
issue could be due to several factors. Here are some specific points to check in the NVIDIARerank
implementation and usage:
API Key: Ensure the API key is correctly set and included in the headers. If the API key is missing or incorrect, it can cause a 400 error.
URL Path: Verify that the URL path is correct. For on-premises NIM, the path should be /v1/ranking
.
Payload Structure: Ensure the payload is correctly structured. The payload should include the model, query, and passages.
Headers: Ensure the headers include the correct authorization token.
Here is a checklist and example code to help you troubleshoot:
API Key:
self._api_key = get_from_param_or_env(
"api_key",
nvidia_api_key or api_key,
"NVIDIA_API_KEY",
"NO_API_KEY_PROVIDED",
)
URL Path:
url = self._base_url
if self._is_hosted:
url += "/retrieval/nvidia/reranking"
else:
url += "/ranking"
Payload Structure:
payloads = {
"model": self.model,
"query": {"text": query_bundle.query_str},
"passages": [
{"text": n.get_content(metadata_mode=MetadataMode.EMBED)}
for n in batch
],
}
Headers:
_headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
}
Here is an example of how to use the NVIDIARerank
class correctly:
import requests
class NVIDIARerank:
def __init__(self, model, base_url, nvidia_api_key=None, top_n=4):
self.model = model
self._base_url = base_url
self._api_key = nvidia_api_key
self.top_n = top_n
def rerank(self, query_str, nodes):
session = requests.Session()
headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
}
payload = {
"model": self.model,
"query": {"text": query_str},
"passages": [{"text": node} for node in nodes],
}
url = f"{self._base_url}/ranking"
response = session.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
# Example usage
base_url = "http://10.237.119.242:8010/v1"
api_key = "your_api_key" # If required
reranker = NVIDIARerank(
model="nvidia/nv-rerankqa-mistral-4b-v3",
base_url=base_url,
nvidia_api_key=api_key,
top_n=4
)
query_str = "which way should i go?"
nodes = [
"two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;",
"then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,",
"and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back.",
"i shall be telling this with a sigh somewhere ages and ages hence: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."
]
try:
result = reranker.rerank(query_str, nodes)
print(result)
except requests.HTTPError as e:
print(f"HTTP error occurred: {e}")
Ensure that:
base_url
is correctly set to your on-premises NIM service.Authorization
header is included if your service requires an API key.If the issue persists, double-check the payload and headers against the working curl
command to ensure they are identical [1].
To continue talking to Dosu, mention @dosu.
Hi, @logan-markewich. I reproduced the following tutorial from Nvidia: https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/playbook.html. My local NIM setup works properly when using the LangChain re-ranker so there must be something wrong with LlamaIndex's from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
.
Thanks
Enrique
Hi,
FYI @logan-markewich and @dosu. I watched the traces from the re-ranker NIM and this is the error:
2024-08-15T18:21:40Z ERROR: root - Uncaught InferenceServerException: [StatusCode.INTERNAL] in ensemble 'nvidia_nv_rerankqa_mistral_4b_v3', Input length 1024 exceeds maximum allowed token size 512; Request: <starlette.requests.Request object at 0x7f646014d180>
The solution would be to let the NVIDIARerank
initialization pass the truncate
flag to the re-ranker just like it is possible to do with the NVIDIAEmbedding
module.
I tried this fix at NVIDIAReran's
base.py
module. I made a couple of additions (commented as vecorro
):.
1) Added an extra header as indicated in Nvidia's documentation:
_headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
'Content-Type': 'application/json', # vecorro
}
2) Added the truncated
value at the end of the payload:
payloads = {
"model": self.model,
"query": {"text": query_bundle.query_str},
"passages": [
{"text": n.get_content(metadata_mode=MetadataMode.EMBED)}
for n in batch
],
"truncate": "END", # vecorro
}
Given the small token limit supported by most Nvidia models shipped with embedding and re-ranking NIMs (512), I think it'd be good that LlamaIndex's integration modules define the {"truncate": "END:}
as the default so things work correctly.
I'm part of the VMware/Broadcom team working with customers to implement our Private AI solutions with Nvidia, and it would help those customers fond of LlamaIndex get this fix implemented. Thanks
To address the issue with the NVIDIARerank
class and ensure it handles the truncate
flag correctly, you can update the class to include the truncate
parameter and pass it to the re-ranker. Here is how you can modify the NVIDIARerank
class:
truncate
field to the NVIDIARerank
class.truncate
flag to the re-ranker.Here is the updated code for the NVIDIARerank
class:
class NVIDIARerank(BaseNodePostprocessor):
"""NVIDIA's API Catalog Reranker Connector."""
class Config:
validate_assignment = True
model: Optional[str] = Field(
default=DEFAULT_MODEL,
description="The NVIDIA API Catalog reranker to use.",
)
top_n: Optional[int] = Field(
default=5,
ge=0,
description="The number of nodes to return.",
)
max_batch_size: Optional[int] = Field(
default=64,
ge=1,
description="The maximum batch size supported by the inference server.",
)
truncate: Literal["NONE", "START", "END"] = Field(
default="NONE",
description=(
"Truncate input text if it exceeds the model's maximum token length. "
"Default is 'NONE', which raises an error if an input is too long."
),
)
_api_key: str = PrivateAttr("NO_API_KEY_PROVIDED") # TODO: should be SecretStr
_mode: str = PrivateAttr("nvidia")
_is_hosted: bool = PrivateAttr(True)
_base_url: str = PrivateAttr(BASE_URL)
def _set_api_key(self, nvidia_api_key: str = None, api_key: str = None) -> None:
self._api_key = get_from_param_or_env(
"api_key",
nvidia_api_key or api_key,
"NVIDIA_API_KEY",
"NO_API_KEY_PROVIDED",
)
def __init__(
self,
model: str = DEFAULT_MODEL,
nvidia_api_key: Optional[str] = None,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
**kwargs: Any,
):
"""
Initialize a NVIDIARerank instance.
This class provides access to a NVIDIA NIM for reranking. By default, it connects to a hosted NIM, but can be configured to connect to an on-premises NIM using the `base_url` parameter. An API key is required for hosted NIM.
Args:
model (str): The model to use for reranking.
nvidia_api_key (str, optional): The NVIDIA API key. Defaults to None.
api_key (str, optional): The API key. Defaults to None.
base_url (str, optional): The base URL of the on-premises NIM. Defaults to None.
**kwargs: Additional keyword arguments.
"""
super().__init__(model=model, **kwargs)
if base_url is None or base_url in MODEL_ENDPOINT_MAP.values():
base_url = MODEL_ENDPOINT_MAP.get(model, BASE_URL)
else:
base_url = self._validate_url(base_url)
self._api_key = get_from_param_or_env(
"api_key",
nvidia_api_key or api_key,
"NVIDIA_API_KEY",
"NO_API_KEY_PROVIDED",
)
self._is_hosted = self._base_url in KNOWN_URLS
if self._is_hosted and self._api_key == "NO_API_KEY_PROVIDED":
warnings.warn(
"An API key is required for hosted NIM. This will become an error in 0.2.0."
)
def _validate_url(self, base_url):
"""
Base URL Validation.
ValueError : url which do not have valid scheme and netloc.
Warning : v1/rankings routes.
ValueError : Any other routes other than above.
"""
expected_format = "Expected format is 'http://host:port'."
result = urlparse(base_url)
if not (result.scheme and result.netloc):
raise ValueError(
f"Invalid base_url, Expected format is 'http://host:port': {base_url}"
)
if result.path:
normalized_path = result.path.strip("/")
if normalized_path == "v1":
pass
elif normalized_path == "v1/rankings":
warnings.warn(f"{expected_format} Rest is Ignored.")
else:
raise ValueError(f"Base URL path is not recognized. {expected_format}")
return urlunparse((result.scheme, result.netloc, "v1", "", "", ""))
@property
def available_models(self) -> List[Model]:
"""Get available models."""
# all available models are in the map
ids = MODEL_ENDPOINT_MAP.keys()
return [Model(id=id) for id in ids]
def rerank(self, query: str, documents: List[str]) -> List[float]:
"""Re-rank documents based on the query."""
session = requests.Session()
_headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
'Content-Type': 'application/json', # vecorro
}
payloads = {
"model": self.model,
"query": {"text": query},
"passages": [{"text": doc} for doc in documents],
"truncate": self.truncate, # vecorro
}
url = self._base_url
if self._is_hosted:
if url.endswith("/v1"):
url += "/retrieval/nvidia/reranking"
else:
url += "/ranking"
response = session.post(url, headers=_headers, json=payloads)
response.raise_for_status()
return [result["logit"] for result in response.json()["rankings"]]
async def arerank(self, query: str, documents: List[str]) -> List[float]:
"""Asynchronously re-rank documents based on the query."""
async with aiohttp.ClientSession() as session:
_headers = {
"Authorization": f"Bearer {self._api_key}",
"Accept": "application/json",
'Content-Type': 'application/json', # vecorro
}
payloads = {
"model": self.model,
"query": {"text": query},
"passages": [{"text": doc} for doc in documents],
"truncate": self.truncate, # vecorro
}
url = self._base_url
if self._is_hosted:
if url.endswith("/v1"):
url += "/retrieval/nvidia/reranking"
else:
url += "/ranking"
async with session.post(url, headers=_headers, json=payloads) as response:
response.raise_for_status()
return [result["logit"] for result in await response.json()["rankings"]]
This code adds the truncate
field to the NVIDIARerank
class and updates the rerank
and arerank
methods to pass the truncate
flag to the re-ranker. This ensures that the input length exceeding the maximum allowed token size is handled appropriately, similar to the NVIDIAEmbedding
module [1][2].
A maintainer has already been looped in to address the documentation and implementation concerns.
To continue talking to Dosu, mention @dosu.
The bug was fixed in llama-index-postprocessor-nvidia-rerank==0.3.1
Bug Description
I'm trying to use a reranker NIM to replace a HiggingFace model. The NIM service works properly from curl. Example:
When I try this reranker, I get
HTTPError: 400 Client Error: Bad Request for url: http://10.237.119.242:8010/v1/ranking
Version
llama-index 0.10.58 / llama-index-postprocessor-nvidia-rerank 0.1.5
Steps to Reproduce
Relevant Logs/Tracbacks