Open afirstenberg opened 2 months ago
I'd absolutely love this feature. I've currently hacked together my own BaseChatModel
for this and it's heavily cut down on my costs. Deepseek just revealed something similar, I wouldn't be surprised if this becomes a trend in the industry.
From an API design perspective, it would make sense for there to be a cache parameter (or kwarg) on System/HumanMessage
to signal to the ChatModel that it should cache said message.
I'm reading the comments and getting a feeling you'd like to cache messages. IMO, the primary use case for this feature is to cache reference data, like larger docs, audio/video, or, maybe extensive elaborate system instructions text.
Looks like the caching is configured at LLM object creation time which is then used normally, would LangChain play a role at all here? Maybe there are other use cases where additional LangChain constructs could help streamline the experience.
And, as always, please think of GenAI and Vertex.AI in tandem when designing things, they are essentially the consumer and enterprise sides of the same AI.
Thanks!
Privileged issue
Issue Content
Gemini now allows a developer to create a context cache with the system instructions, contents, tools, and model information already set, and then reference this context as part of a standard query. It must be explicitly cached (ie - it is not automatic as part of a request or reply) and a cache expiration can be set (and later changed).
It does not appear to be supported in Vertex AI at this time.
Open issues:
References: