Open jingk-1992 opened 2 months ago
Hey @jingk-1992! 👋 I'm here to help you with any bugs, questions, or contributions you might have for the project. Let me know how I can assist you!
The issue you're encountering is due to the data type conversion when retrieving embeddings from Redis. The embeddings are stored as base64-encoded byte strings and are converted back to numpy float64 arrays upon retrieval. This type conversion is causing the TypeError
during JSON serialization.
To address this, you can modify the retrieval process to ensure the embeddings are converted to native Python float types before being used. Here's a potential solution:
Retrieve the embedding from Redis:
embedding = redis_client.get(embedding_cache_key)
if embedding:
redis_client.expire(embedding_cache_key, 600)
embedding_array = np.frombuffer(base64.b64decode(embedding), dtype="float")
embedding_list = embedding_array.tolist() # Convert to native Python list of floats
return embedding_list
Ensure the embedding is a list of native Python floats:
embedding_results = (embedding_results / np.linalg.norm(embedding_results)).tolist()
By converting the numpy array to a native Python list of floats using .tolist()
, you can avoid the TypeError
during JSON serialization.
Here is the updated code snippet for the embed_query
method:
def embed_query(self, text: str) -> list[float]:
"""Embed query text."""
# use doc embedding cache or store if not exists
hash = helper.generate_text_hash(text)
embedding_cache_key = f'{self._model_instance.provider}_{self._model_instance.model}_{hash}'
embedding = redis_client.get(embedding_cache_key)
if embedding:
redis_client.expire(embedding_cache_key, 600)
embedding_array = np.frombuffer(base64.b64decode(embedding), dtype="float")
embedding_list = embedding_array.tolist() # Convert to native Python list of floats
return embedding_list
try:
embedding_result = self._model_instance.invoke_text_embedding(
texts=[text],
user=self._user
)
embedding_results = embedding_result.embeddings[0]
embedding_results = (embedding_results / np.linalg.norm(embedding_results)).tolist()
except Exception as ex:
raise ex
try:
# encode embedding to base64
embedding_vector = np.array(embedding_results)
vector_bytes = embedding_vector.tobytes()
# Transform to Base64
encoded_vector = base64.b64encode(vector_bytes)
# Transform to string
encoded_str = encoded_vector.decode("utf-8")
redis_client.setex(embedding_cache_key, 600, encoded_str)
except IntegrityError:
db.session.rollback()
except:
logging.exception('Failed to add embedding to redis')
return embedding_results
This ensures that the embeddings are stored and retrieved as native Python floats, preventing the TypeError
during JSON serialization [1].
To continue talking to Dosu, mention @dosu.
Self Checks
Dify version
v0.6.16
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
Background: Set the vector library to chroma. Operations:
✔️ Expected Behavior
The data stored in Redis is of the float type, and it should also be of the float type when retrieved, not the float64 type.
❌ Actual Behavior
The data stored in Redis is of the float type, but when retrieved, it turns out to be of the float64 type.