Redis semantic cache working with custom embeddings

pradeepdev-1995 commented 5 days ago

I am following the official documentation scrips for semantic cache. In the following code

from redisvl.extensions.llmcache import SemanticCache
llmcache = SemanticCache(
    name="llmcache",                     # underlying search index name
    redis_url="redis://localhost:6379",  # redis connection url string
    distance_threshold=0.2               # semantic cache distance threshold
)
llmcache.store(
    prompt="What is the capital city of France?",
    response="Paris",
    metadata={"city": "Paris", "country": "france"}
)
llmcache.check(prompt=question)[0]['response']

I have these questions,

1 - In llmcache.store() hope I can store the custom vector of the prompt directly rather than prompt. That custom vector can be generated by any embedding model like sentence transformer, minilm-l12-v2, openai embeddings, huggingface embeddings...etc 2 - Is there any embedding length limitation to store using llmcache.store()? Shall I use a vector with any length? 3 - In llmcache.check() hope we can pass the vector( from sentence transformer, minilm-l12-v2, openai embeddings, huggingface embeddings,..etc) directly for the semantic matching purpose,rather than the query 4 - inside llmcache.check() which distance measure using for finding the semantic similarity ? is it cosine similarity or any other? do we have the privilege to configure that?

tylerhutcherson commented 5 days ago

Great feedback @pradeepdev-1995 -- @justin-cechmanek will be able to chime in to answer your questions on how this works today. However, we are rapidly working on an update to the semantic cache class and feedback like this is very helpful.

As a quick starting point, checkout the newly included CustomTextVectorizer class as a way to manage this better.

justin-cechmanek commented 5 days ago

Hi @pradeepdev-1995 I'll get straight to your questions: 1 - You can directly pass the vector of your choice to llmcache.store() as a parameter. llmcache.store(prompt='<text>', vector=[1.0,2.0,3.0], response='<text>'). This will ensure your specified vector is used for vector comparisons. The text prompt is still stored. 2 - We don't impose any vector size limits. Vectors should be the same size for a given search index. 3 - Similar to llmcache.store() you can directly pass your own vector to llmcache.check(). If you do so then you don't need to pass a text prompt. llmcache.check(vector=[1.0,2.0,3.0]) works.

The distance measure used is cosine similarity between vectors. It is not currently configurable.

As @tylerhutcherson mentioned, if you want to use your own vectorizer instead of one of the several we support, you can use the CustomTextVectorizer class to wrap your embedding function and then pass this vectorizer to the SemanticCache constructor and it will handle embedding of prompts in store() and check().

Is configuring the distance metric to something other than cosine similarity a feature you would like to have implemented?

pradeepdev-1995 commented 5 days ago

@justin-cechmanek Thanks for the detailed anwer. One clarification regarding CustomTextVectorizer class. I am already using huggingface embedding models for embedding creation outside of the redis semantic cache using the corresponding native library and models. So those generated embeddings I can directly store (llmcache.store()) and able to search(llmcache.check()) like you mentioned above right? Then why I should use CustomTextVectorizer class here?

justin-cechmanek commented 5 days ago

CustomTextVectorizer is required as the cache builds an index schema around the vectorizer. Without it, it's possible to pass your own vector to check() and store() but they must be the same dimension as embeddings generated by the default vectorizer, which is currently 768 dimensions.

# without CustomTextVectorizer
cache = SemanticCache() # default vectorizer is created internally

prompt_1 = 'your prompt 1'
vector_1 = my_embed_function(prompt_1)
response = llm_call(prompt_1)
cache.store(prompt=prompt_1, response=response, vector=vector_1) # fails if vector_1 has wrong dimensions

prompt_2 = 'your prompt 2'
vector_2 = my_embed_function(prompt_2)

res = cache.check(vector=vector_2) # fails if vector_2 has wrong dimensions

The recommended approach is:

# with CustomVTextVectorizer
custom_vectorizer = CustomTextVectorizer(embed=my_embed_function)
cache = SemanticCache(vectorizer=custom_vectorizer)

prompt_1 = 'your prompt 1'
response = llm_call(prompt_1)
cache.store(prompt=prompt_1, response=response)

prompt_2 = 'your prompt 2'

res = cache.check(prompt_2)

redis / redis-vl-python

Redis semantic cache working with custom embeddings #176