zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
6.89k stars 480 forks source link

[Bug]: Cache没有生效 #625

Closed Songjiadong closed 2 months ago

Songjiadong commented 2 months ago

Current Behavior

import json
import time

from gptcache import Cache
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from langchain.globals import set_llm_cache
from langchain_community.embeddings.xinference import XinferenceEmbeddings
from langchain_openai import ChatOpenAI
from openai.resources import Embeddings
from gptcache.embedding.langchain import LangChain
from config import MILVUS, XINFERENCE_URL, EMBEDDING_MODEL_NAME, CACHE_DB_URL, LLM_MODEL_NAME, API_KEY, API_BASE, \
    LLM_MAX_TOKENS, LLM_TEMPERATURE,APP_VERBOSE
from langchain_community.cache import GPTCache

def __get_content_func(data, **_):
    prompt = data.get("prompt")
    dc=json.loads(prompt)
    result=dc[0].get("kwargs").get("content")
    split_0=str(result.split("Human:")[-1])
    human=split_0.split('AI:')[0]
    if APP_VERBOSE is True:
        print(f"MIIC Cache:{human}")
    return human

def init_miic_cache(embeddings: Embeddings):
        cache_base = CacheBase(name='mysql', sql_url=CACHE_DB_URL,table_name='gptcache')
        vector_base = VectorBase(name='milvus',
                                 host=MILVUS["host"],
                                 port=MILVUS["port"],
                                 user=MILVUS["user"],
                                 password=MILVUS["password"],
                                 top_k=1,
                                 index_params={
                                     "metric_type": "IP",
                                     "index_type": "IVF_FLAT",
                                     "params": {"nprobe": 10, "nlist": 128}
                                 },
                                 search_params={
                                     "metric_type": "IP",
                                     "index_type": "IVF_FLAT",
                                     "params": {"nprobe": 10, "nlist": 128}
                                 },
                                 dimension=1024,
                                 collection_name="gptcache")
        data_manager = get_data_manager(cache_base, vector_base,max_size=1000)

        def init_gptcache(cache_obj: Cache, llm: str):
            encoder = LangChain(embeddings=embeddings, dimension=1024)
            cache_obj.init(pre_embedding_func=__get_content_func,
                           data_manager=data_manager,
                           similarity_evaluation=SearchDistanceEvaluation(),
                           embedding_func=encoder.to_embeddings)
        set_llm_cache(GPTCache(init_func=init_gptcache))

if __name__ == "__main__":
    xinference = XinferenceEmbeddings(
        server_url=XINFERENCE_URL, model_uid=EMBEDDING_MODEL_NAME
    )
    init_miic_cache(embeddings=xinference)
    llm = ChatOpenAI(
        model=LLM_MODEL_NAME,
        openai_api_key=API_KEY,
        openai_api_base=API_BASE,
        max_tokens=LLM_MAX_TOKENS,
        temperature=LLM_TEMPERATURE
    )
    start_time = time.time()
    message = llm.invoke("你好介绍一下自己")
    print(message)
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    start_time = time.time()
    message = llm.invoke("你好介绍一下自己")
    print(message)
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    start_time = time.time()
    message = llm.invoke("你好介绍一下自己")
    print(message)
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    start_time = time.time()
    message = llm.invoke("你好介绍一下自己")
    print(message)
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print("finished")

Expected Behavior

MIIC Cache:你好介绍一下自己
MIIC Cache:你好介绍一下自己
content='你好,我是来自阿里云的大规模语言模型,我叫通义千问。我是一个能够回答问题、创作文字,还能表达观点、撰写代码的超大规模语言模型。我具备丰富的知识和语言理解能力,可以提供各种领域的信息和帮助,无论是科技知识、生活常识,还是创意写作、解决问题,只要是你能想到的,我都可以尽力提供支持。在使用过程中,如果你有任何问题或者需要帮助,都可以随时向我提问哦。\n' response_metadata={'token_usage': {'completion_tokens': 102, 'prompt_tokens': 21, 'total_tokens': 123}, 'model_name': 'miic', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-b754e43a-a258-4c41-b2ae-9bec1146294d-0'
Time consuming: 3.69s
MIIC Cache:你好介绍一下自己
MIIC Cache:你好介绍一下自己
content='你好,我是来自阿里云的大规模语言模型,我叫通义千问。作为一个AI助手,我的主要职责是帮助用户解答问题、提供信息、进行对话等,无论是科技知识、生活常识,还是创意构思、学习辅导,只要用户有需求,我会尽我所能提供帮助。我会不断学习和进步,不断提升自己的能力,为用户提供更精准、更人性化的服务。有什么问题,尽管向我提问吧!\n' response_metadata={'token_usage': {'completion_tokens': 96, 'prompt_tokens': 21, 'total_tokens': 117}, 'model_name': 'miic', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-1117133d-740c-4024-8af2-ff00b890258a-0'
Time consuming: 3.56s
MIIC Cache:你好介绍一下自己
MIIC Cache:你好介绍一下自己
content='你好,我是来自阿里云的大规模语言模型,我叫通义千问。作为一个AI助手,我的主要职责是帮助用户获得准确、有用的信息,解答各种问题,提供语言相关的创作帮助,比如写故事、写公文、写邮件、做翻译等。我会不断学习和进步,不断提升自己的能力,为用户提供更好的服务。如果你有任何问题或者需要帮助,尽管告诉我,我会尽力提供支持。\n' response_metadata={'token_usage': {'completion_tokens': 92, 'prompt_tokens': 21, 'total_tokens': 113}, 'model_name': 'miic', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-49ab01f9-b357-4915-a243-fb44cb31ed65-0'
Time consuming: 3.28s
MIIC Cache:你好介绍一下自己
MIIC Cache:你好介绍一下自己
content='你好,我是来自阿里云的大规模语言模型,我叫通义千问。我是一个能够回答问题、创作文字,还能表达观点、撰写代码的超大规模语言模型。我的目标是帮助用户获得准确、有用的信息,解决他们的问题,提供创新的思路。无论你是在学习、工作还是生活中遇到疑惑,都可以随时向我提问,我会尽我所能提供帮助。让我们一起探索知识的无限边界吧!\n' response_metadata={'token_usage': {'completion_tokens': 95, 'prompt_tokens': 21, 'total_tokens': 116}, 'model_name': 'miic', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-6e66b665-ffcf-4d5d-babd-57020fc42633-0'
Time consuming: 3.49s
finished

时间基本相似,应该没有缓存输出?

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

Songjiadong commented 2 months ago

数据在数据库中存在,应该insert没问题,但是貌似每次都insert了,帮忙看看,感觉有问题 gptquestion表生成了包含了四条 question为你好介绍一下自己的记录

SimFG commented 2 months ago

@Songjiadong it seem that you use the invoke, refer to: https://github.com/zilliztech/GPTCache/issues/585#issuecomment-1972720103

Songjiadong commented 2 months ago

@SimFG 能给我一个demo么 我没怎么看懂,我确实使用了invoke

SimFG commented 2 months ago

the demo in https://github.com/zilliztech/GPTCache/issues/585#issuecomment-1972720103