langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.39k stars 14.77k forks source link

google-genai [feature]: Context Caching #23259

Open afirstenberg opened 2 months ago

afirstenberg commented 2 months ago

Privileged issue

Issue Content

Gemini now allows a developer to create a context cache with the system instructions, contents, tools, and model information already set, and then reference this context as part of a standard query. It must be explicitly cached (ie - it is not automatic as part of a request or reply) and a cache expiration can be set (and later changed).

It does not appear to be supported in Vertex AI at this time.

Open issues:

References:

abhiaagarwal commented 1 month ago

I'd absolutely love this feature. I've currently hacked together my own BaseChatModel for this and it's heavily cut down on my costs. Deepseek just revealed something similar, I wouldn't be surprised if this becomes a trend in the industry.

From an API design perspective, it would make sense for there to be a cache parameter (or kwarg) on System/HumanMessage to signal to the ChatModel that it should cache said message.

aperepel commented 3 days ago

I'm reading the comments and getting a feeling you'd like to cache messages. IMO, the primary use case for this feature is to cache reference data, like larger docs, audio/video, or, maybe extensive elaborate system instructions text.

Looks like the caching is configured at LLM object creation time which is then used normally, would LangChain play a role at all here? Maybe there are other use cases where additional LangChain constructs could help streamline the experience.

And, as always, please think of GenAI and Vertex.AI in tandem when designing things, they are essentially the consumer and enterprise sides of the same AI.

Thanks!