[Bug]: OptimumEmbedding(BaseEmbedding) cannot be selected between CUDA and CPU

Hey @rushai-dev! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this together while we wait for a human maintainer.

To ensure the model output is moved to the specified device (e.g., GPU) for processing in the OptimumEmbedding(BaseEmbedding) class, you need to modify the _embed method. Here is the updated method:

def _embed(self, sentences: List[str]) -> List[List[float]]:
    """Embed sentences."""
    encoded_input = self._tokenizer(
        sentences,
        padding=True,
        max_length=self.max_length,
        truncation=True,
        return_tensors="pt",
    )

    # Move encoded input to the specified device
    encoded_input = {key: value.to(self._device) for key, value in encoded_input.items()}

    model_output = self._model(**encoded_input)

    if self.pooling == "cls":
        embeddings = self._cls_pooling(model_output)
    else:
        embeddings = self._mean_pooling(
            model_output, encoded_input["attention_mask"]
        )

    if self.normalize:
        import torch

        embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

    return embeddings.tolist()

In this modification, the encoded_input dictionary is updated to move its tensors to the specified device using .to(self._device). This ensures that the model output is processed on the specified device ^[1]^[2].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Bug]: OptimumEmbedding(BaseEmbedding) cannot be selected between CUDA and CPU #15908

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks