run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.29k stars 4.66k forks source link

[Feature Request]: Gemini 1.5 pro caching #14454

Open arthurbrenno opened 6 days ago

arthurbrenno commented 6 days ago

Feature Description

Google is now proving a new way to reduce costs by caching input tokens to be referenced in subsequent requests. It would be nice to have this implemented.

Docs: https://ai.google.dev/gemini-api/docs/caching?utm_source=gais&utm_medium=email&utm_campaign=june&lang=python

Reason

This can help gemini users reduce costs when larger system prompts are using or even intensive agent tasks.

Value of Feature

No response

masci commented 3 days ago

Hi @arthurbrenno and thanks for the feature request!

How do you envision the usage of the cache in LlamaIndex? I'm not sure what the UX should be for two reasons here:

If you have any suggestion about how we could integrate the feature, even in pseudo-code, that would help me a lot!