[Request] 支持提示词缓存Prompt caching

lobehub / lobe-chat

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT/ Claude application.

https://chat-preview.lobehub.com

Other

44.29k stars 9.92k forks source link

[Request] 支持提示词缓存Prompt caching #4561

Open AiharaMahiru opened 1 week ago

AiharaMahiru commented 1 week ago

🥰 需求描述

部分API如OpenAI/Claude/MOONSHOT等已支持Prompt caching,能够大幅降低多轮问答的成本

🧐 解决方案

提供开关选项

📝 补充信息

No response

lobehubbot commented 1 week ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

🥰 Description of requirements

Some APIs such as OpenAI/Claude/MOONSHOT already support Prompt caching, which can significantly reduce the cost of multiple rounds of question and answer

🧐 Solution

Provide switch options

📝 Supplementary information

No response

lobehubbot commented 1 week ago

👀 @AiharaMahiru

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. Please make sure you have given us as much context as possible.\ 非常感谢您提交 issue。我们会尽快调查此事，并尽快回复您。请确保您已经提供了尽可能多的背景信息。

arvinxx commented 1 week ago

OpenAI 的prompt cacheing 是默认开启的，不需要额外设置

lobehubbot commented 1 week ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

OpenAI's prompt caching is enabled by default and no additional settings are required.

BrandonStudio commented 5 days ago

Anthropic Claude 缓存的提示仅在5分钟内有效，我认为不适合本项目

lobehubbot commented 5 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Anthropic Claude Cached tips are only valid for 5 minutes and I don't think they are suitable for this project

lobehubbot commented 4 days ago

✅ @AiharaMahiru

This issue is closed, If you have any questions, you can comment and reply.\ 此问题已经关闭。如果您有任何问题，可以留言并回复。

arvinxx commented 4 days ago

Anthropic 的 Caching 其实我有计划做的

lobehubbot commented 4 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Actually I have a plan to do it

BrandonStudio commented 3 days ago

Anthropic 的 Caching 其实我有计划做的

这个我感觉没啥意义吧？它的适用场景一般是单一功能的聊天机器人，比如某公司的客服，需要短时间内多次调用API，并且多次调用的提示具有相同的前缀这个项目一般是个人用，尽管不同的助手有不同的内置系统提示，但是，(1) 用户未必在5分钟内单一频繁地调用该助手；(2) 系统提示是可以更改的如果每次聊天都写入缓存，但是5分钟内不命中的话，整体费用将提高25% Anthropic 支持最多4个缓存控制点，如果允许用户选择将缓存控制点插入何处，将不成比例地增加用户的理解成本，因为其它模型服务商不支持提示缓存，或以非常不同的方式支持。

lobehubbot commented 3 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Caching of Anthropic I actually have plans to do it

I don’t think this makes any sense, does it? Its applicable scenario is generally a single-function chat robot, such as a company's customer service, which needs to call the API multiple times in a short period of time, and the prompts for multiple calls have the same prefix. This project is generally for personal use. Although different assistants have different built-in system prompts, (1) the user may not call the assistant frequently within 5 minutes; (2) the system prompts can be changed. If every chat is written to the cache but misses within 5 minutes, the overall cost will increase by 25% Anthropic supports up to 4 cache control points. Allowing users to choose where to insert cache control points will disproportionately increase the user's understanding cost, because other model servers do not support hint caching, or support it in a very different way.

arvinxx commented 3 days ago

@BrandonStudio 有意义的，比如 system prompts 的缓存就非常有价值，像 Artifacts 4000个 tokens，只需要多一轮对话，那么默认缓存就值回本了，更不用说类似爬虫插件一次拉回一篇超长文本的场景（1w）。

还有类似文件上传的 case，结合 prompt caching，我可以直接做成全文本上传的方案，那么这个节省下来的费用更是可观。

至于在交互上，这个不会去让用户自行操作的，而是针对个别类型的上下文做。比如 system role ，tools 的调用返回，PDF 文件的内容这些。

另外我之前测的时候也不是所有内容都支持 cache 的，user 的 content 如果少于 x 个 token（具体数值有点忘了），加了 cache 反而会直接抛错。所以 cache 前我会做一轮检测的，如果字符串长度小于某个值也不会去 cache。

lobehubbot commented 3 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

@BrandonStudio It makes sense. For example, the cache of system prompts is very valuable. For example, 4000 tokens of Artifacts only require one more round of dialogue, so the default cache will be worth it, not to mention a pull by a similar crawler plug-in. Scenario of replying a very long text (1w).

There is also a case similar to file upload. Combined with prompt caching, I can directly make a full text upload solution, so the cost savings will be considerable.

As for interaction, this will not be done by users themselves, but will be done based on individual types of context. For example, system role, tool call return, PDF file content, etc.

In addition, when I tested before, not all content supported cache. If the user's content was less than x tokens (I forgot the specific value), adding cache would directly throw an error. So I will do a round of testing before caching. If the string length is less than a certain value, it will not be cached.

BrandonStudio commented 3 days ago

问题还是5分钟的缓存时间限制，怎么样保证添加这个功能之后费用是减少的，而不是反而增加

lobehubbot commented 3 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

The problem is still a 5 -minute cache time limit. How to ensure that the cost after adding this function is reduced, not but increase

AiharaMahiru commented 3 days ago

问题还是5分钟的缓存时间限制，怎么样保证添加这个功能之后费用是减少的，而不是反而增加

所以说给个开关准没错。 PS：我个人经常问长篇代码，单次对话3~4k，基本在三分钟左右累计到20k左右（sonnet）

lobehubbot commented 3 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

The problem is still the 5-minute cache time limit. How to ensure that after adding this function, the cost will be reduced instead of increased.

So it's right to give a switch.

BrandonStudio commented 3 days ago

这样的话应该再加个定时器

lobehubbot commented 3 days ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

In this case, you should add a timer