langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.75k stars 13.95k forks source link

Caching: allows to cache only the prompt #7141

Closed FrancescoSaverioZuppichini closed 1 week ago

FrancescoSaverioZuppichini commented 1 year ago

Feature request

Hi there 👋

Thanks a lot for the awesome library. The current implementation of BaseCache stores the prompt + the llm generated text as key.

This means that I am not really caching since I'll have to do a request to OpenAI to get the llm text

Motivation

I'd like to cache a prompt

Your contribution

I am willing to contribute but you need to explain me how :)

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

I understand that you're looking to modify the caching mechanism in the library so that it only caches the prompt, rather than the prompt and the language model generated text. This would allow you to avoid making a request to OpenAI to retrieve the language model text.

Here's a potential solution: You could modify the lookup and update methods of the BaseCache class and its subclasses. Currently, these methods use both the prompt and llm_string to create a key for caching. You can change this to use only the prompt.

For instance, in the BaseCache class, you can change the method signatures of lookup and update to accept only prompt. Then, in the InMemoryCache, SQLAlchemyCache, RedisCache, RedisSemanticCache, GPTCache, and MomentoCache classes, you can modify the lookup and update methods to use only prompt.

Here's an example of how you might modify the InMemoryCache class:

class InMemoryCache(BaseCache):
    """Cache that stores things in memory."""

    def lookup(self, prompt: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt."""
        return self._cache.get(prompt, None)

    def update(self, prompt: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt."""
        self._cache[prompt] = return_val

Please note that this change will make the cache key based only on the prompt, and not on the language model or its settings. This means that if you use the same prompt with different language models or settings, the cache will return the same result. If this is not what you want, you may need to consider a different caching strategy.

If you believe this solution is correct and you're comfortable with the changes, I encourage you to make a pull request. This would be a valuable contribution to the library.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

FrancescoSaverioZuppichini commented 1 year ago

I am not sure since I think the methods will be only after the prediction is made

rjarun8 commented 1 year ago

@FrancescoSaverioZuppichini so you are suggesting a different option for the users to cache prompt only rather than prompt + llm string? Because modifying the existing cache.py base class has inherited impacts across other concrete classes. Rather than modifying that, creating a separate prompt-only cache and creating the rest of the integrations to adapt would enable this functionality as a standalone feature rather than conflicting with the existing behavior. At least this is what I could think of.

rjarun8 commented 1 year ago

Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.

from abc import ABC, abstractmethod
from typing import Any, Optional,Sequence
from langchain.schema import Generation
RETURN_VAL_TYPE = Sequence[Generation]

class BaseCache(ABC):
    """Base interface for cache."""

    @abstractmethod
    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt and llm_string."""

    @abstractmethod
    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt and llm_string."""

    @abstractmethod
    def clear(self, **kwargs: Any) -> None:
        """Clear cache that can take additional keyword arguments."""

class PromptOnlyCache(BaseCache):
    def __init__(self):
        self.cache = {}

    def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
        # Ignore llm_string, only use prompt as key
        return self.cache.get(prompt)

    def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
        # Ignore llm_string, only use prompt as key
        self.cache[prompt] = return_val

    def clear(self) -> None:
        self.cache.clear()

import langchain
from langchain.llms import OpenAI

import os
os.environ["OPENAI_API_KEY"] = ""

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()

# The first time, it is not yet in cache, so it should take longer
print("First time:")
%timeit llm("Tell me a joke")

# The second time it is, so it goes faster
print("Second time:")
%timeit llm("Tell me a joke")

# The third time it is, so it goes faster
print("third time:")
%timeit llm("Tell me a joke")
First time:
113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Second time:
111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
third time:
95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`
FrancescoSaverioZuppichini commented 1 year ago

@FrancescoSaverioZuppichini so you are suggesting a different option for the users to cache prompt only rather than prompt + llm string? Because modifying the existing cache.py base class has inherited impacts across other concrete classes. Rather than modifying that, creating a separate prompt-only cache and creating the rest of the integrations to adapt would enable this functionality as a standalone feature rather than conflicting with the existing behavior. At least this is what I could think of.

Thanks a lot for the reply, I am assuming that people would like to also cache their input so they don't have to actually make requests to OpenAI - I can for sure just add that part in the code (maybe in the current implementation you should have some sort of namespace to avoid flushdb that may destroy a vector db) but I was wondering if there is some plan or some use case to support this

FrancescoSaverioZuppichini commented 1 year ago

Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.

from abc import ABC, abstractmethod
from typing import Any, Optional,Sequence
from langchain.schema import Generation
RETURN_VAL_TYPE = Sequence[Generation]

class BaseCache(ABC):
    """Base interface for cache."""

    @abstractmethod
    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt and llm_string."""

    @abstractmethod
    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt and llm_string."""

    @abstractmethod
    def clear(self, **kwargs: Any) -> None:
        """Clear cache that can take additional keyword arguments."""

class PromptOnlyCache(BaseCache):
    def __init__(self):
        self.cache = {}

    def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
        # Ignore llm_string, only use prompt as key
        return self.cache.get(prompt)

    def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
        # Ignore llm_string, only use prompt as key
        self.cache[prompt] = return_val

    def clear(self) -> None:
        self.cache.clear()

import langchain
from langchain.llms import OpenAI

import os
os.environ["OPENAI_API_KEY"] = ""

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()

# The first time, it is not yet in cache, so it should take longer
print("First time:")
%timeit llm("Tell me a joke")

# The second time it is, so it goes faster
print("Second time:")
%timeit llm("Tell me a joke")

# The third time it is, so it goes faster
print("third time:")
%timeit llm("Tell me a joke")
First time:
113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Second time:
111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
third time:
95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`

so this will prevent to actually making a request to OpenAI? (thanks a lot btw 🙏 )

rjarun8 commented 1 year ago

Between I tried this out. But I was not able to find how _"langchain.llmcache" works as it expects both the prompt and the LLM string. Then in that case it becomes evident that if there has to be a prompt-only implementation the behavior of _"langchain.llmcache" has to be modified as well. So for now I just had to use the same construct but changed the return to only prompt.

from abc import ABC, abstractmethod
from typing import Any, Optional,Sequence
from langchain.schema import Generation
RETURN_VAL_TYPE = Sequence[Generation]

class BaseCache(ABC):
    """Base interface for cache."""

    @abstractmethod
    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt and llm_string."""

    @abstractmethod
    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt and llm_string."""

    @abstractmethod
    def clear(self, **kwargs: Any) -> None:
        """Clear cache that can take additional keyword arguments."""

class PromptOnlyCache(BaseCache):
    def __init__(self):
        self.cache = {}

    def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
        # Ignore llm_string, only use prompt as key
        return self.cache.get(prompt)

    def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
        # Ignore llm_string, only use prompt as key
        self.cache[prompt] = return_val

    def clear(self) -> None:
        self.cache.clear()

import langchain
from langchain.llms import OpenAI

import os
os.environ["OPENAI_API_KEY"] = ""

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()

# The first time, it is not yet in cache, so it should take longer
print("First time:")
%timeit llm("Tell me a joke")

# The second time it is, so it goes faster
print("Second time:")
%timeit llm("Tell me a joke")

# The third time it is, so it goes faster
print("third time:")
%timeit llm("Tell me a joke")
First time:
113 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Second time:
111 µs ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
third time:
95.2 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)`

so this will prevent to actually making a request to OpenAI? (thanks a lot btw 🙏 )

I guess so. A screen shot of the two runs. The results are identical , more like seeded.

Screenshot_20230705-005439_Chrome

Screenshot_20230705-005447_Chrome

FrancescoSaverioZuppichini commented 1 year ago

If I disable internet it doesn't run

from langchain.cache import BaseCache

class PromptOnlyCache(BaseCache):
    def __init__(self):
        self.cache = {"Tell me a joke" : "foo"}

    def lookup(self, prompt: str, llm_string: str) -> Optional[Any]:
        # Ignore llm_string, only use prompt as key
        return self.cache.get(prompt)

    def update(self, prompt: str, llm_string: str, return_val: Any) -> None:
        # Ignore llm_string, only use prompt as key
        self.cache[prompt] = return_val

    def clear(self) -> None:
        self.cache.clear()

import langchain
from langchain.llms import OpenAI

def timeit(func):
    def _inner(*args, **kwargs):
        start = perf_counter()
        res = func(*args, **kwargs)
        end = perf_counter()
        print(f"Elapsed = {(end - start)*1000:.4f}")
        return res
    return _inner

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

# Use the new PromptOnlyCache
langchain.llm_cache = PromptOnlyCache()

timeit(llm)("Tell me a joke")
timeit(llm)("Tell me a joke")
timeit(llm)("Tell me a joke")
dosubot[bot] commented 9 months ago

Hi, @FrancescoSaverioZuppichini! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you requested a feature to allow caching of only the prompt in the current implementation of BaseCache. There have been some discussions on how to implement this, with suggestions to modify the lookup and update methods of the BaseCache class and its subclasses to use only the prompt. Another suggestion was to create a separate prompt-only cache to avoid conflicting with the existing behavior.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

FrancescoSaverioZuppichini commented 9 months ago

Hi, @FrancescoSaverioZuppichini! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you requested a feature to allow caching of only the prompt in the current implementation of BaseCache. There have been some discussions on how to implement this, with suggestions to modify the lookup and update methods of the BaseCache class and its subclasses to use only the prompt. Another suggestion was to create a separate prompt-only cache to avoid conflicting with the existing behavior.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

sup @dosu-beta

yazanrisheh commented 6 months ago

Hey guys @FrancescoSaverioZuppichini. I'm new to langchain and I'm trying to connect my chatbot that I'm doing with LangChain to a db like postgress, sql server, or mysql which I have done so far but I was wondering how can I implement semantic caching as what the documentation is showing is simply exact caching.

theinhumaneme commented 3 months ago

Hey guys @FrancescoSaverioZuppichini. I'm new to langchain and I'm trying to connect my chatbot that I'm doing with LangChain to a db like postgress, sql server, or mysql which I have done so far but I was wondering how can I implement semantic caching as what the documentation is showing is simply exact caching.

you could use GPTCache which uses semantic caching under the hood, but as of today it doesn't work. If you could fix the issue it could be great :D, I'm looking into the library right to see if I can make a pull request.

Did you get a solution to your problem?