Closed FrancescoSaverioZuppichini closed 1 week ago
Answer generated by a 🤖
I understand that you're looking to modify the caching mechanism in the library so that it only caches the prompt, rather than the prompt and the language model generated text. This would allow you to avoid making a request to OpenAI to retrieve the language model text.
Here's a potential solution: You could modify the lookup
and update
methods of the BaseCache
class and its subclasses. Currently, these methods use both the prompt
and llm_string
to create a key for caching. You can change this to use only the prompt
.
For instance, in the BaseCache
class, you can change the method signatures of lookup
and update
to accept only prompt
. Then, in the InMemoryCache
, SQLAlchemyCache
, RedisCache
, RedisSemanticCache
, GPTCache
, and MomentoCache
classes, you can modify the lookup
and update
methods to use only prompt
.
Here's an example of how you might modify the InMemoryCache
class:
class InMemoryCache(BaseCache):
"""Cache that stores things in memory."""
def lookup(self, prompt: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up based on prompt."""
return self._cache.get(prompt, None)
def update(self, prompt: str, return_val: RETURN_VAL_TYPE) -> None:
"""Update cache based on prompt."""
self._cache[prompt] = return_val
Please note that this change will make the cache key based only on the prompt, and not on the language model or its settings. This means that if you use the same prompt with different language models or settings, the cache will return the same result. If this is not what you want, you may need to consider a different caching strategy.
If you believe this solution is correct and you're comfortable with the changes, I encourage you to make a pull request. This would be a valuable contribution to the library.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
I am not sure since I think the methods will be only after the prediction is made
@FrancescoSaverioZuppichini so you are suggesting a different option for the users to cache prompt only rather than prompt + llm string? Because modifying the existing cache.py base class has inherited impacts across other concrete classes. Rather than modifying that, creating a separate prompt-only cache and creating the rest of the integrations to adapt would enable this functionality as a standalone feature rather than conflicting with the existing behavior. At least this is what I could think of.
Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.
from abc import ABC, abstractmethod
from typing import Any, Optional,Sequence
from langchain.schema import Generation
RETURN_VAL_TYPE = Sequence[Generation]
class BaseCache(ABC):
"""Base interface for cache."""
@abstractmethod
def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up based on prompt and llm_string."""
@abstractmethod
def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
"""Update cache based on prompt and llm_string."""
@abstractmethod
def clear(self, **kwargs: Any) -> None:
"""Clear cache that can take additional keyword arguments."""
class PromptOnlyCache(BaseCache):
def __init__(self):
self.cache = {}
def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
# Ignore llm_string, only use prompt as key
return self.cache.get(prompt)
def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
# Ignore llm_string, only use prompt as key
self.cache[prompt] = return_val
def clear(self) -> None:
self.cache.clear()
import langchain
from langchain.llms import OpenAI
import os
os.environ["OPENAI_API_KEY"] = ""
# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()
# The first time, it is not yet in cache, so it should take longer
print("First time:")
%timeit llm("Tell me a joke")
# The second time it is, so it goes faster
print("Second time:")
%timeit llm("Tell me a joke")
# The third time it is, so it goes faster
print("third time:")
%timeit llm("Tell me a joke")
First time:
113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Second time:
111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
third time:
95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`
@FrancescoSaverioZuppichini so you are suggesting a different option for the users to cache prompt only rather than prompt + llm string? Because modifying the existing cache.py base class has inherited impacts across other concrete classes. Rather than modifying that, creating a separate prompt-only cache and creating the rest of the integrations to adapt would enable this functionality as a standalone feature rather than conflicting with the existing behavior. At least this is what I could think of.
Thanks a lot for the reply, I am assuming that people would like to also cache their input so they don't have to actually make requests to OpenAI - I can for sure just add that part in the code (maybe in the current implementation you should have some sort of namespace to avoid flushdb that may destroy a vector db) but I was wondering if there is some plan or some use case to support this
Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.
from abc import ABC, abstractmethod from typing import Any, Optional,Sequence from langchain.schema import Generation RETURN_VAL_TYPE = Sequence[Generation] class BaseCache(ABC): """Base interface for cache.""" @abstractmethod def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]: """Look up based on prompt and llm_string.""" @abstractmethod def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None: """Update cache based on prompt and llm_string.""" @abstractmethod def clear(self, **kwargs: Any) -> None: """Clear cache that can take additional keyword arguments.""" class PromptOnlyCache(BaseCache): def __init__(self): self.cache = {} def lookup(self, prompt: str, llm_string: str) -> Optional[Any]: # Ignore llm_string, only use prompt as key return self.cache.get(prompt) def update(self, prompt: str, llm_string: str, return_val: Any) -> None: # Ignore llm_string, only use prompt as key self.cache[prompt] = return_val def clear(self) -> None: self.cache.clear() import langchain from langchain.llms import OpenAI import os os.environ["OPENAI_API_KEY"] = "" # To make the caching really obvious, lets use a slower model. llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2) # Use the new PromptOnlyCache langchain.llm_cache = PromptOnlyCache() # The first time, it is not yet in cache, so it should take longer print("First time:") %timeit llm("Tell me a joke") # The second time it is, so it goes faster print("Second time:") %timeit llm("Tell me a joke") # The third time it is, so it goes faster print("third time:") %timeit llm("Tell me a joke")
First time: 113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) Second time: 111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) third time: 95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`
so this will prevent to actually making a request to OpenAI? (thanks a lot btw 🙏 )
Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.
from abc import ABC, abstractmethod from typing import Any, Optional,Sequence from langchain.schema import Generation RETURN_VAL_TYPE = Sequence[Generation] class BaseCache(ABC): """Base interface for cache.""" @abstractmethod def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]: """Look up based on prompt and llm_string.""" @abstractmethod def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None: """Update cache based on prompt and llm_string.""" @abstractmethod def clear(self, **kwargs: Any) -> None: """Clear cache that can take additional keyword arguments.""" class PromptOnlyCache(BaseCache): def __init__(self): self.cache = {} def lookup(self, prompt: str, llm_string: str) -> Optional[Any]: # Ignore llm_string, only use prompt as key return self.cache.get(prompt) def update(self, prompt: str, llm_string: str, return_val: Any) -> None: # Ignore llm_string, only use prompt as key self.cache[prompt] = return_val def clear(self) -> None: self.cache.clear() import langchain from langchain.llms import OpenAI import os os.environ["OPENAI_API_KEY"] = "" # To make the caching really obvious, lets use a slower model. llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2) # Use the new PromptOnlyCache langchain.llm_cache = PromptOnlyCache() # The first time, it is not yet in cache, so it should take longer print("First time:") %timeit llm("Tell me a joke") # The second time it is, so it goes faster print("Second time:") %timeit llm("Tell me a joke") # The third time it is, so it goes faster print("third time:") %timeit llm("Tell me a joke")
First time: 113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) Second time: 111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) third time: 95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`
so this will prevent to actually making a request to OpenAI? (thanks a lot btw 🙏 )
I guess so. A screen shot of the two runs. The results are identical , more like seeded.
If I disable internet it doesn't run
from langchain.cache import BaseCache
class PromptOnlyCache(BaseCache):
def __init__(self):
self.cache = {"Tell me a joke" : "foo"}
def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
# Ignore llm_string, only use prompt as key
return self.cache.get(prompt)
def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
# Ignore llm_string, only use prompt as key
self.cache[prompt] = return_val
def clear(self) -> None:
self.cache.clear()
import langchain
from langchain.llms import OpenAI
def timeit(func):
def _inner(*args, **kwargs):
start = perf_counter()
res = func(*args, **kwargs)
end = perf_counter()
print(f"Elapsed = {(end - start)*1000:.4f}")
return res
return _inner
# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()
timeit(llm)("Tell me a joke")
timeit(llm)("Tell me a joke")
timeit(llm)("Tell me a joke")
Hi, @FrancescoSaverioZuppichini! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you requested a feature to allow caching of only the prompt in the current implementation of BaseCache
. There have been some discussions on how to implement this, with suggestions to modify the lookup
and update
methods of the BaseCache
class and its subclasses to use only the prompt. Another suggestion was to create a separate prompt-only cache to avoid conflicting with the existing behavior.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
Hi, @FrancescoSaverioZuppichini! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you requested a feature to allow caching of only the prompt in the current implementation of
BaseCache
. There have been some discussions on how to implement this, with suggestions to modify thelookup
andupdate
methods of theBaseCache
class and its subclasses to use only the prompt. Another suggestion was to create a separate prompt-only cache to avoid conflicting with the existing behavior.Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
sup @dosu-beta
Hey guys @FrancescoSaverioZuppichini. I'm new to langchain and I'm trying to connect my chatbot that I'm doing with LangChain to a db like postgress, sql server, or mysql which I have done so far but I was wondering how can I implement semantic caching as what the documentation is showing is simply exact caching.
Hey guys @FrancescoSaverioZuppichini. I'm new to langchain and I'm trying to connect my chatbot that I'm doing with LangChain to a db like postgress, sql server, or mysql which I have done so far but I was wondering how can I implement semantic caching as what the documentation is showing is simply exact caching.
you could use GPTCache which uses semantic caching under the hood, but as of today it doesn't work. If you could fix the issue it could be great :D, I'm looking into the library right to see if I can make a pull request.
Did you get a solution to your problem?
Feature request
Hi there 👋
Thanks a lot for the awesome library. The current implementation of
BaseCache
stores the prompt + the llm generated text as key.This means that I am not really caching since I'll have to do a request to OpenAI to get the llm text
Motivation
I'd like to cache a prompt
Your contribution
I am willing to contribute but you need to explain me how :)