zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.15k stars 503 forks source link

[Bug]: gptcache seems does not support get-turbo-3.5-16k chat generation #535

Open hueiyuan opened 1 year ago

hueiyuan commented 1 year ago

Current Behavior

When I would like to use gptcache as langchan cache but found the below error message:

File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 361, in acall
    raise e
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 355, in acall
    await self._acall(inputs, run_manager=run_manager)
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/agents/agent.py", line 1088, in _acall
    next_step_output = await self._atake_next_step(
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/agents/agent.py", line 932, in _atake_next_step
    output = await self.agent.aplan(
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/agents/agent.py", line 477, in aplan
    full_output = await self.llm_chain.apredict(callbacks=callbacks, **full_inputs)
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/llm.py", line 272, in apredict
    return (await self.acall(kwargs, callbacks=callbacks))[self.output_key]
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 361, in acall
    raise e
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 355, in acall
    await self._acall(inputs, run_manager=run_manager)
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/llm.py", line 237, in _acall
    response = await self.agenerate([inputs], run_manager=run_manager)
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chains/llm.py", line 115, in agenerate
    return await self.llm.agenerate_prompt(
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chat_models/base.py", line 424, in agenerate_prompt
    return await self.agenerate(
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chat_models/base.py", line 384, in agenerate
    raise exceptions[0]
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/chat_models/base.py", line 495, in _agenerate_with_cache
    return ChatResult(generations=cache_val)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1076, in pydantic.main.validate_model
  File "pydantic/fields.py", line 895, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 928, in pydantic.fields.ModelField._validate_sequence_like
  File "pydantic/fields.py", line 1094, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 884, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 1101, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 1157, in pydantic.fields.ModelField._apply_validators
  File "pydantic/class_validators.py", line 337, in pydantic.class_validators._generic_validator_basic.lambda13
  File "pydantic/main.py", line 719, in pydantic.main.BaseModel.validate
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/load/serializable.py", line 75, in __init__
    super().__init__(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/Users/xxx/Library/Python/3.9/lib/python/site-packages/langchain/schema/output.py", line 61, in set_text
    values["text"] = values["message"].content
KeyError: 'message'

It seems that current gptcache do not support chat generation related api. I would like to confirm how to process and fix it?

Environment

gptcache==0.1.39.1
langchain==0.0.281

init cache example code

from langchain.cache import GPTCache
from gptcache.adapter.api import init_similar_cache

init_similar_cache(
        cache_obj=cache_obj,
        data_dir=f"similar_cache_{hashed_llm}",
        embedding=openai_embedding,
        data_manager=data_manager,
)

langchain.llm_cache = GPTCache(init_llm_cache)

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

SimFG commented 1 year ago

It seems that langchain is incompatible with gptcache image

hueiyuan commented 1 year ago

@SimFG Based on this question in Microsoft, current GPT 3.5 turbo and GPT 3.5 turbo 16k just support chat completions api so that we only can use chat based model as llm to develop related chain and agent.

But gptcache this issue have some discussion which shows does not support chat generation.

However, I also trace error message, indeed the model is using chat completions api to package message, and current gptcache does not resolve this format.

I think this might that is a possible reason result in incompatibility.

SimFG commented 1 year ago

@hueiyuan can you show me a full demo code about this case?

hueiyuan commented 1 year ago

@SimFG This my sample demo code for your information, and you need to replace config related dict variable:

### GPTCache refactor
import os

import openai  # pylint: disable=C0413
import numpy as np

from gptcache.embedding.base import BaseEmbedding
from gptcache.utils import import_openai

import_openai()

AZURE_OPENAI_CONF = {
    "embedding_model_name":"....",
    "embedding_deployment_name":"....",
    "api_key":"....",
    "api_endpoint":"....",
    "api_version":"....",
    "api_type":"...."
}

class GPTCacheAzureOpenAIEmbedding(BaseEmbedding):
    """Generate text embedding for given text using OpenAI.

    :param model: model name, defaults to 'text-embedding-ada-002'.
    :type model: str
    :param api_key: OpenAI API Key. When the parameter is not specified, it will load the key by default if it is available.
    :type api_key: str

    Example:
        .. code-block:: python

            from gptcache.embedding import OpenAI

            test_sentence = 'Hello, world.'
            encoder = OpenAI(api_key='your_openai_key')
            embed = encoder.to_embeddings(test_sentence)
    """

    def __init__(
        self, 
        model: str = AZURE_OPENAI_CONF["embedding_model_name"], 
        deployment_id: str = AZURE_OPENAI_CONF["embedding_deployment_name"],
        api_key: str = AZURE_OPENAI_CONF["api_key"], 
        api_base: str = AZURE_OPENAI_CONF["api_endpoint"],
        api_version: str = AZURE_OPENAI_CONF["api_version"],
        api_type: str = AZURE_OPENAI_CONF["api_type"]
    ):
        if not api_key:
            if openai.api_key:
                api_key = openai.api_key
            else:
                api_key = os.getenv("OPENAI_API_KEY")
        if not api_base:
            if openai.api_base:
                api_base = openai.api_base
            else:
                api_base = os.getenv("OPENAI_API_BASE")

        openai.api_key = api_key
        openai.api_base = api_base
        openai.api_type = api_type
        openai.api_version = api_version

        self.api_base = api_base  # don't override all of openai as we may just want to override for say embeddings
        self.model = model
        self.deployment_id = deployment_id

        if model in self.dim_dict():
            self.__dimension = self.dim_dict()[model]
        else:
            self.__dimension = None

    def to_embeddings(self, data, **_):
        """Generate embedding given text input

        :param data: text in string.
        :type data: str

        :return: a text embedding in shape of (dim,).
        """
        sentence_embeddings = openai.Embedding.create(
            model=self.model, 
            input=data, 
            api_base=self.api_base,
            deployment_id=self.deployment_id
        )
        return np.array(sentence_embeddings["data"][0]["embedding"]).astype("float32")

    @property
    def dimension(self):
        """Embedding dimension.

        :return: embedding dimension
        """
        if not self.__dimension:
            foo_emb = self.to_embeddings("foo")
            self.__dimension = len(foo_emb)
        return self.__dimension

    @staticmethod
    def dim_dict():
        return {"text-embedding-ada-002": 1536}

from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

LLM_CACHING_CONF ={
    "gptcache_store_conf": {
        "connect_string": "<postgresql-connect_string>",
        "maximum_text_length": 65535
    },
    "similarity_threshold": 0.8,
    "cache_eviction": "LRU"
}

class LLMGPTCaching:
    def __init__(self):
        self.cache_conf = Config(
            similarity_threshold=0.8
        )
        self.cache_openai_encoder = GPTCacheAzureOpenAIEmbedding()

        cache_base = CacheBase(
            'postgresql',
            sql_url = LLM_CACHING_CONF["gptcache_store_conf"]["connect_string"],
            table_len_config = {
                "question_question": LLM_CACHING_CONF["gptcache_store_conf"]["maximum_text_length"],
                "answer_answer": LLM_CACHING_CONF["gptcache_store_conf"]["maximum_text_length"],
                "session_id": LLM_CACHING_CONF["gptcache_store_conf"]["maximum_text_length"],
                "dep_name": LLM_CACHING_CONF["gptcache_store_conf"]["maximum_text_length"],
                "dep_data": LLM_CACHING_CONF["gptcache_store_conf"]["maximum_text_length"]
            }
        )

        vector_base = VectorBase(
            "pgvector",
            url=LLM_CACHING_CONF["gptcache_store_conf"]["connect_string"],
            collection_name="llm_cache",
            dimension=self.cache_openai_encoder.dimension
        )

        self.data_manager = get_data_manager(
            cache_base=cache_base,
            vector_base=vector_base,
            eviction=LLM_CACHING_CONF["cache_eviction"]
        )

    def init_llm_cache(self, cache_obj: Cache, llm: str):
        hashed_llm = hashlib.sha256(llm.encode()).hexdigest()

        init_similar_cache(
            cache_obj=cache_obj,
            data_dir=f"similar_cache_{hashed_llm}",
            embedding=self.cache_openai_encoder,
            data_manager=self.data_manager,
            evaluation=SearchDistanceEvaluation(),
            config=self.cache_conf
        )

langchain.llm_cache = GPTCache(init_llm_cache)

from langchain.chat_models import AzureChatOpenAI
openai_conf = {
    "gpt_model_name": "....",
    "gpt_deployment_name": "....",
    "api_version": "....",
    "api_endpoint": "....",
    "api_type": "....",
    "api_key": "....",
    "temperature": "....",
}

azure_openai_model = AzureChatOpenAI(
    model_name=openai_conf["gpt_model_name"],
    deployment_name=openai_conf["gpt_deployment_name"],
    openai_api_version=openai_conf["api_version"],
    openai_api_base=openai_conf["api_endpoint"],
    openai_api_type=openai_conf["api_type"],
    openai_api_key=openai_conf["api_key"],
    temperature=openai_conf["temperature"]
)

from langchain.agents.structured_chat.base import StructuredChatAgent
from langchain.chains import LLMChain
from langchain.agents.agent import Agent
from langchain.agents import AgentExecutor

agent_prompt = StructuredChatAgent.create_prompt(
    tools=current_tools,
    prefix=AGENT_PREFIX,
    human_message_template=HUMAN_MESSAGE_TEMPLATE,
    suffix=AGENT_SUFFIX,
    format_instructions=AGENT_FORMAT_INSTRUCTIONS,
    input_variables=["input", "chat_history", "agent_scratchpad"],
)

agent_llm_chain = LLMChain(
    llm=azure_openai_model,
    prompt=agent_prompt,
    # verbose=True
)

agent = StructuredChatAgent(
    llm_chain=agent_llm_chain,
    tools=current_tools,
    early_stopping_method="force",
    # verbose=True
)

agen_chain = AgentExecutor.from_agent_and_tools(
    agent=agent,
    tools=current_tools,
    # verbose=True, 
)

agen_chain.run(".......")