[Feature Request]: Gemini 1.5 pro caching

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

https://docs.llamaindex.ai

MIT License

33.29k stars 4.66k forks source link

[Feature Request]: Gemini 1.5 pro caching #14454

Open arthurbrenno opened 6 days ago

arthurbrenno commented 6 days ago

Feature Description

Google is now proving a new way to reduce costs by caching input tokens to be referenced in subsequent requests. It would be nice to have this implemented.

Docs: https://ai.google.dev/gemini-api/docs/caching?utm_source=gais&utm_medium=email&utm_campaign=june&lang=python

Reason

This can help gemini users reduce costs when larger system prompts are using or even intensive agent tasks.

Value of Feature

No response

masci commented 3 days ago

Hi @arthurbrenno and thanks for the feature request!

How do you envision the usage of the cache in LlamaIndex? I'm not sure what the UX should be for two reasons here:

The cache only works for storing context
Using the cache is a two-steps process: create the cache, then create the model using the cache from previous step.

If you have any suggestion about how we could integrate the feature, even in pseudo-code, that would help me a lot!