Open rushai-dev opened 2 weeks ago
Hey @rushai-dev! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this together while we wait for a human maintainer.
To ensure the model output is moved to the specified device (e.g., GPU) for processing in the OptimumEmbedding(BaseEmbedding)
class, you need to modify the _embed
method. Here is the updated method:
def _embed(self, sentences: List[str]) -> List[List[float]]:
"""Embed sentences."""
encoded_input = self._tokenizer(
sentences,
padding=True,
max_length=self.max_length,
truncation=True,
return_tensors="pt",
)
# Move encoded input to the specified device
encoded_input = {key: value.to(self._device) for key, value in encoded_input.items()}
model_output = self._model(**encoded_input)
if self.pooling == "cls":
embeddings = self._cls_pooling(model_output)
else:
embeddings = self._mean_pooling(
model_output, encoded_input["attention_mask"]
)
if self.normalize:
import torch
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
return embeddings.tolist()
In this modification, the encoded_input
dictionary is updated to move its tensors to the specified device using .to(self._device)
. This ensures that the model output is processed on the specified device [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description
https://github.com/run-llama/llama_index/blob/162f5a0523f5a4de33f8cc056ec2b24713d2ee9e/llama-index-integrations/embeddings/llama-index-embeddings-huggingface-optimum/llama_index/embeddings/huggingface_optimum/base.py#L143C9-L143C52
model_output = self._model(**encoded_input)
to
model_output = self._model(**encoded_input).to(self._device)
Version
llama-index-embeddings-huggingface-optimum==0.2.0
Steps to Reproduce
model_output = self._model(**encoded_input).to(self._device)
Relevant Logs/Tracbacks
No response