Closed NarasimmanSaravana1994 closed 3 months ago
The issue you're facing is due to the EmbeddingEndEvent
expecting a list of lists of floats (List[List[float]]
) for the embeddings, but it's receiving a single list of floats instead. To fix this, you need to adjust the _get_text_embedding
method in your InstructorEmbeddings
class to ensure it returns a list of lists. Here's the corrected method:
def _get_text_embedding(self, text: str) -> List[List[float]]:
embeddings = self._model.encode([[self._instruction, text]])
return [embeddings[0]] # Wrap the embedding in a list
This change wraps the embedding in an additional list, matching the expected format. If you're using a similar pattern for query embeddings, apply the same fix to those methods as well. This should resolve the validation error you're encountering.
class InstructorEmbeddings(BaseEmbedding): _model: INSTRUCTOR = PrivateAttr() _instruction: str = PrivateAttr()
def __init__(
self,
instructor_model_name: str = "hkunlp/instructor-large",
instruction: str = "Represent a document for semantic search:",
**kwargs: Any,
) -> None:
self._model = INSTRUCTOR(instructor_model_name)
self._instruction = instruction
super().__init__(**kwargs)
@classmethod
def class_name(cls) -> str:
return "instructor"
async def _aget_query_embedding(self, query: str) -> List[float]:
return self._get_query_embedding(query)
async def _aget_text_embedding(self, text: str) -> List[float]:
return self._get_text_embedding(text)
def _get_query_embedding(self, query: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, query]])
return embeddings[0]
def _get_text_embedding(self, text: str) -> List[List[float]]:
embeddings = self._model.encode([[self._instruction, text]])
return [embeddings[0]] # Wrap the embedding in a list
still, I faced the same issue even I changed the code above the trailing comment
Bug Description
I create a custom embedding model using InstructorEmbedding (reference:https://docs.llamaindex.ai/en/stable/examples/embeddings/custom_embeddings/)
After embedding model was generated, just validate the with some text exception occurs
code :
import openai import os from typing import Any, List from InstructorEmbedding import INSTRUCTOR
from llama_index.core.bridge.pydantic import PrivateAttr from llama_index.core.embeddings import BaseEmbedding
class InstructorEmbeddings(BaseEmbedding): _model: INSTRUCTOR = PrivateAttr() _instruction: str = PrivateAttr()
embed_model = InstructorEmbeddings(embed_batch_size=2)
embeddings = embed_model.get_text_embedding('my name is narasimman')
embedding
Version
llama-index 0.10.9
Steps to Reproduce
Above mentioned comments, I shared the code to reproduce,
Relevant Logs/Tracbacks