Open nitskoh opened 3 months ago
Hey @nitskoh! 👋 I'm here to help you with bugs, questions, and more while you wait for a human maintainer. Let's squash those bugs together!
To resolve the JSONDecodeError
encountered when running the text embedding inference code snippet from the LlamaIndex example, ensure that the model name you are using is correct and supported by Predibase. The error suggests that the model deployment ID might not be found at Predibase.
Here are some steps to troubleshoot and fix the issue:
Verify Model Name: Ensure that the model_name
you are using is correct and listed in the Predibase supported models. You can refer to the Predibase documentation for the list of supported models: Predibase Models.
Check API Token: Ensure that the PREDIBASE_API_TOKEN
is correctly set and valid.
Update Base URL: Ensure that the base_url
is correctly set to the endpoint of the text embeddings service. The default URL is set to http://127.0.0.1:8080
, but it might need to be updated to match your deployment configuration.
Here is the relevant part of the code where the error is raised:
import httpx
def _call_api(self, texts: List[str]) -> List[List[float]]:
headers = {"Content-Type": "application/json"}
if self.auth_token is not None:
if callable(self.auth_token):
headers["Authorization"] = self.auth_token(self.base_url)
else:
headers["Authorization"] = self.auth_token
json_data = {"inputs": texts, "truncate": self.truncate_text}
with httpx.Client() as client:
response = client.post(
f"{self.base_url}/embed",
headers=headers,
json=json_data,
timeout=self.timeout,
)
return response.json()
Additionally, if you are using the HuggingFaceEmbedding
class from LlamaIndex, ensure that the model and tokenizer are correctly initialized. Here is a snippet from the HuggingFaceEmbedding
class:
class HuggingFaceEmbedding(BaseEmbedding):
tokenizer_name: str = Field(description="Tokenizer name from HuggingFace.")
max_length: int = Field(
default=DEFAULT_HUGGINGFACE_LENGTH, description="Maximum length of input.", gt=0
)
pooling: Pooling = Field(default=None, description="Pooling strategy.")
normalize: bool = Field(default=True, description="Normalize embeddings or not.")
query_instruction: Optional[str] = Field(
description="Instruction to prepend to query text."
)
text_instruction: Optional[str] = Field(
description="Instruction to prepend to text."
)
cache_folder: Optional[str] = Field(
description="Cache folder for huggingface files."
)
_model: Any = PrivateAttr()
_tokenizer: Any = PrivateAttr()
_device: str = PrivateAttr()
def __init__(
self,
model_name: Optional[str] = None,
tokenizer_name: Optional[str] = None,
pooling: Optional[str] = None,
max_length: Optional[int] = None,
query_instruction: Optional[str] = None,
text_instruction: Optional[str] = None,
normalize: bool = True,
model: Optional[Any] = None,
tokenizer: Optional[Any] = None,
embed_batch_size: int = DEFAULT_EMBED_BATCH_SIZE,
cache_folder: Optional[str] = None,
trust_remote_code: bool = False,
device: Optional[str] = None,
callback_manager: Optional[CallbackManager] = None,
):
try:
from transformers import AutoModel, AutoTokenizer
except ImportError:
raise ImportError(
"HuggingFaceEmbedding requires transformers to be installed.\n"
"Please install transformers with `pip install transformers`."
)
self._device = device or infer_torch_device()
cache_folder = cache_folder or get_cache_dir()
if model is None: # Use model_name with AutoModel
model_name = (
model_name
if model_name is not None
else DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
)
model = AutoModel.from_pretrained(
model_name, cache_dir=cache_folder, trust_remote_code=trust_remote_code
)
elif model_name is None: # Extract model_name from model
model_name = model.name_or_path
self._model = model.to(self._device)
if tokenizer is None: # Use tokenizer_name with AutoTokenizer
tokenizer_name = (
model_name or tokenizer_name or DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
)
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_name, cache_dir=cache_folder
)
elif tokenizer_name is None: # Extract tokenizer_name from model
tokenizer_name = tokenizer.name_or_path
self._tokenizer = tokenizer
if max_length is None:
try:
max_length = int(self._model.config.max_position_embeddings)
except AttributeError as exc:
raise ValueError(
"Unable to find max_length from model config. Please specify max_length."
) from exc
if not pooling:
pooling = get_pooling_mode(model_name)
try:
pooling = Pooling(pooling)
except ValueError as exc:
raise NotImplementedError(
f"Pooling {pooling} unsupported, please pick one in"
f" {[p.value for p in Pooling]}."
) from exc
super().__init__(
embed_batch_size=embed_batch_size,
callback_manager=callback_manager,
model_name=model_name,
tokenizer_name=tokenizer_name,
max_length=max_length,
pooling=pooling,
normalize=normalize,
query_instruction=query_instruction,
text_instruction=text_instruction,
)
Ensure that the model and tokenizer are correctly set up and that the max_length
and pooling
parameters are properly configured. This should help in resolving the JSONDecodeError
[1][2][3][4].
To continue talking to Dosu, mention @dosu.
This does not help. As I am using your example
@nitskoh Looks like the very first step is to deploy the embedding server which sounds like a paid option as described here. cc: @namin @tobi
Its not paid, but you need to deploy it somewhere before you can use it (local docker, was, etc.)
Thanks for the note, @logan-markewich. Which of the deployment methods have you used?
Bug Description
I tried running the code snippet in this example https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/
I got the following error
Version
Using the colab notebook provided
Steps to Reproduce
Running the colab notebook here
https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/
Relevant Logs/Tracbacks
No response