[Bug]: text_embedding_inference. JSONDecodeError: Expecting value: line 1 column 1 (char 0)

nitskoh commented 3 months ago

Bug Description

I tried running the code snippet in this example https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/

I got the following error

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-26-a7c0f9477fcc> in <cell line: 13>()
     11 
     12 
---> 13 embeddings = embed_model.get_text_embedding("Hello World!")
     14 print(len(embeddings))
     15 print(embeddings[:5])

8 frames
/usr/lib/python3.10/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Version

Using the colab notebook provided

Steps to Reproduce

Running the colab notebook here

https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 3 months ago

Hey @nitskoh! 👋 I'm here to help you with bugs, questions, and more while you wait for a human maintainer. Let's squash those bugs together!

To resolve the JSONDecodeError encountered when running the text embedding inference code snippet from the LlamaIndex example, ensure that the model name you are using is correct and supported by Predibase. The error suggests that the model deployment ID might not be found at Predibase.

Here are some steps to troubleshoot and fix the issue:

Verify Model Name: Ensure that the model_name you are using is correct and listed in the Predibase supported models. You can refer to the Predibase documentation for the list of supported models: Predibase Models.
Check API Token: Ensure that the PREDIBASE_API_TOKEN is correctly set and valid.
Update Base URL: Ensure that the base_url is correctly set to the endpoint of the text embeddings service. The default URL is set to http://127.0.0.1:8080, but it might need to be updated to match your deployment configuration.

Here is the relevant part of the code where the error is raised:

import httpx

def _call_api(self, texts: List[str]) -> List[List[float]]:
    headers = {"Content-Type": "application/json"}
    if self.auth_token is not None:
        if callable(self.auth_token):
            headers["Authorization"] = self.auth_token(self.base_url)
        else:
            headers["Authorization"] = self.auth_token
    json_data = {"inputs": texts, "truncate": self.truncate_text}

    with httpx.Client() as client:
        response = client.post(
            f"{self.base_url}/embed",
            headers=headers,
            json=json_data,
            timeout=self.timeout,
        )

    return response.json()

Additionally, if you are using the HuggingFaceEmbedding class from LlamaIndex, ensure that the model and tokenizer are correctly initialized. Here is a snippet from the HuggingFaceEmbedding class:

class HuggingFaceEmbedding(BaseEmbedding):
    tokenizer_name: str = Field(description="Tokenizer name from HuggingFace.")
    max_length: int = Field(
        default=DEFAULT_HUGGINGFACE_LENGTH, description="Maximum length of input.", gt=0
    )
    pooling: Pooling = Field(default=None, description="Pooling strategy.")
    normalize: bool = Field(default=True, description="Normalize embeddings or not.")
    query_instruction: Optional[str] = Field(
        description="Instruction to prepend to query text."
    )
    text_instruction: Optional[str] = Field(
        description="Instruction to prepend to text."
    )
    cache_folder: Optional[str] = Field(
        description="Cache folder for huggingface files."
    )

    _model: Any = PrivateAttr()
    _tokenizer: Any = PrivateAttr()
    _device: str = PrivateAttr()

    def __init__(
        self,
        model_name: Optional[str] = None,
        tokenizer_name: Optional[str] = None,
        pooling: Optional[str] = None,
        max_length: Optional[int] = None,
        query_instruction: Optional[str] = None,
        text_instruction: Optional[str] = None,
        normalize: bool = True,
        model: Optional[Any] = None,
        tokenizer: Optional[Any] = None,
        embed_batch_size: int = DEFAULT_EMBED_BATCH_SIZE,
        cache_folder: Optional[str] = None,
        trust_remote_code: bool = False,
        device: Optional[str] = None,
        callback_manager: Optional[CallbackManager] = None,
    ):
        try:
            from transformers import AutoModel, AutoTokenizer
        except ImportError:
            raise ImportError(
                "HuggingFaceEmbedding requires transformers to be installed.\n"
                "Please install transformers with `pip install transformers`."
            )

        self._device = device or infer_torch_device()

        cache_folder = cache_folder or get_cache_dir()
        if model is None:  # Use model_name with AutoModel
            model_name = (
                model_name
                if model_name is not None
                else DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
            )
            model = AutoModel.from_pretrained(
                model_name, cache_dir=cache_folder, trust_remote_code=trust_remote_code
            )
        elif model_name is None:  # Extract model_name from model
            model_name = model.name_or_path
        self._model = model.to(self._device)

        if tokenizer is None:  # Use tokenizer_name with AutoTokenizer
            tokenizer_name = (
                model_name or tokenizer_name or DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
            )
            tokenizer = AutoTokenizer.from_pretrained(
                tokenizer_name, cache_dir=cache_folder
            )
        elif tokenizer_name is None:  # Extract tokenizer_name from model
            tokenizer_name = tokenizer.name_or_path
        self._tokenizer = tokenizer

        if max_length is None:
            try:
                max_length = int(self._model.config.max_position_embeddings)
            except AttributeError as exc:
                raise ValueError(
                    "Unable to find max_length from model config. Please specify max_length."
                ) from exc

        if not pooling:
            pooling = get_pooling_mode(model_name)
        try:
            pooling = Pooling(pooling)
        except ValueError as exc:
            raise NotImplementedError(
                f"Pooling {pooling} unsupported, please pick one in"
                f" {[p.value for p in Pooling]}."
            ) from exc
        super().__init__(
            embed_batch_size=embed_batch_size,
            callback_manager=callback_manager,
            model_name=model_name,
            tokenizer_name=tokenizer_name,
            max_length=max_length,
            pooling=pooling,
            normalize=normalize,
            query_instruction=query_instruction,
            text_instruction=text_instruction,
        )

Ensure that the model and tokenizer are correctly set up and that the max_length and pooling parameters are properly configured. This should help in resolving the JSONDecodeError ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

nitskoh commented 3 months ago

This does not help. As I am using your example

SID262000 commented 3 months ago

@nitskoh Looks like the very first step is to deploy the embedding server which sounds like a paid option as described here. cc: @namin @tobi

logan-markewich commented 3 months ago

Its not paid, but you need to deploy it somewhere before you can use it (local docker, was, etc.)

SID262000 commented 3 months ago

Thanks for the note, @logan-markewich. Which of the deployment methods have you used?

run-llama / llama_index