microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
31.71k stars 4.61k forks source link

Add support for request caching #3637

Open ekzhu opened 5 days ago

ekzhu commented 5 days ago

What feature would you like to be added?

Implement a caching mechanism for LLM API calls to reduce unnecessary API calls, similar to that of in 0.2.

When enabled, this feature should allow us to retrieve cached responses for identical LLM requests instead of making new API calls. Ideal to include a configuration flag to enable/disable caching as well as for managing the cache.

We don't need to follow the same API as in the 0.2 version. We can have the cache managed by the model client instead.

Why is this needed?

Save cost on identical inference requests.

ekzhu commented 5 days ago

@husseinmozannar moved your issue here.