Open yithuang123 opened 2 hours ago
A similar issue has been reported in an open thread: [Question]: pydantic.v1.error_wrappers.ValidationError: 1 validation error for EmbeddingEndEvent embeddings value is not a valid list (type=type_error.list). The suggested solution is to ensure that the embedding model returns valid results.
To resolve the ValidationError
for EmbeddingEndEvent
in version 0.11.8, you need to handle the special tokens correctly. The error message suggests that the text contains a disallowed special token <endoftext>
. You can resolve this by configuring the allowed_special
or disallowed_special
parameters.
Here is an example of how you can modify your code to handle this:
from llama_index import ServiceContext, QuestionAnswerPrompt
# Define your custom prompt template
QA_PROMPT_TMPL = (
"### Human: Consider the information below \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
"### Assistant:"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
# Configure the service context with the custom embeddings
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
prompt_helper=prompt_helper,
embed_model=embed_model,
allowed_special={'<endoftext>'} # Allow the special token '<endoftext>'
)
# Alternatively, you can disable the check for all special tokens
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
prompt_helper=prompt_helper,
embed_model=embed_model,
disallowed_special=() # Disable the check for all special tokens
)
By setting allowed_special={'<endoftext>'}
, you explicitly allow the <endoftext>
token. Alternatively, setting disallowed_special=()
disables the check for all special tokens. Adjust these parameters based on your specific requirements.
Additionally, ensure that your embedding events are correctly dispatched and handled. Here is an example of how embedding events are managed in the LlamaIndex framework:
from llama_index.core.instrumentation.events.embedding import EmbeddingStartEvent, EmbeddingEndEvent
from llama_index.core.base.embeddings.base import dispatcher, EventPayload, CBEventType
def get_text_embedding(text: str) -> Embedding:
model_dict = self.to_dict()
model_dict.pop("api_key", None)
dispatcher.event(
EmbeddingStartEvent(
model_dict=model_dict,
)
)
with self.callback_manager.event(
CBEventType.EMBEDDING, payload={EventPayload.SERIALIZED: self.to_dict()}
) as event:
text_embedding = self._get_text_embedding(text)
event.on_end(
payload={
EventPayload.CHUNKS: [text],
EventPayload.EMBEDDINGS: [text_embedding],
}
)
dispatcher.event(
EmbeddingEndEvent(
chunks=[text],
embeddings=[text_embedding],
)
)
return text_embedding
Ensure that your embedding process correctly dispatches EmbeddingStartEvent
and EmbeddingEndEvent
with the appropriate payloads. This will help in tracking and resolving any issues related to embedding events [1][2][3].
To continue talking to Dosu, mention @dosu.
Maybe double check the returned type of your embeddings? Seems like it's complaining about a list of lists
Make sure the query end text embedding methods are returning a 1D list of float
Only get_text_embeddings (with the s) should return a list of lists
Yes, the error is due to wrong return type of query embedding method. Thanks.
Bug Description
Got this error _ValidationError: 1 validation error for EmbeddingEndEvent embeddings.0.0 Input should be a valid number [type=float_type, input_value=[0.10854760557413101, -0....-0.00011981000716332346], input_type=list] For further information visit https://errors.pydantic.dev/2.9/v/float_type_
when i run
Version
0.11.8
Steps to Reproduce
custom embedding
Relevant Logs/Tracbacks
No response