google-genai [feature]: Context Caching

afirstenberg commented 2 months ago

Privileged issue

[X] I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.

Issue Content

Gemini now allows a developer to create a context cache with the system instructions, contents, tools, and model information already set, and then reference this context as part of a standard query. It must be explicitly cached (ie - it is not automatic as part of a request or reply) and a cache expiration can be set (and later changed).

It does not appear to be supported in Vertex AI at this time.

Open issues:

Best paradigm to add to cache or integrate with LangChain history system
Best paradigm to reference

References:

AI Studio / genai: https://ai.google.dev/gemini-api/docs/caching?lang=python
LangChain.js: https://github.com/langchain-ai/langchainjs/issues/5841

abhiaagarwal commented 1 month ago

I'd absolutely love this feature. I've currently hacked together my own BaseChatModel for this and it's heavily cut down on my costs. Deepseek just revealed something similar, I wouldn't be surprised if this becomes a trend in the industry.

From an API design perspective, it would make sense for there to be a cache parameter (or kwarg) on System/HumanMessage to signal to the ChatModel that it should cache said message.

aperepel commented 3 days ago

I'm reading the comments and getting a feeling you'd like to cache messages. IMO, the primary use case for this feature is to cache reference data, like larger docs, audio/video, or, maybe extensive elaborate system instructions text.

Looks like the caching is configured at LLM object creation time which is then used normally, would LangChain play a role at all here? Maybe there are other use cases where additional LangChain constructs could help streamline the experience.

And, as always, please think of GenAI and Vertex.AI in tandem when designing things, they are essentially the consumer and enterprise sides of the same AI.

Thanks!

langchain-ai / langchain

google-genai [feature]: Context Caching #23259

Privileged issue

Issue Content