lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
5.29k stars 471 forks source link

Add caching when using API like GPT4o or Gemini #258

Closed dhuynh95 closed 3 months ago

dhuynh95 commented 3 months ago

We are aware (and sorry) that the default option is OpenAI for our open-source framework but it is currently the only working solution for the WorldModel.

We do realize it takes a toll on your credits so I think an immediate and temporary solution is to provide caching to reduce calls.

I looked at GPTCache but it is outdated and does not support OpenAI 1.X. They have an old PR from March that could work.

Is someone interested in making their PR work and integrate it in LaVague? Thanks!

alhridoy commented 3 months ago

Hi @dhuynh95 , would love to work on the issue. You could assign me for that!

dhuynh95 commented 3 months ago

Hi @alhridoy! That would be great! Do you need help for this? I think it's pretty orthogonal based on how they designed but if we can help we would be happy to

alhridoy commented 3 months ago

Hi @alhridoy! That would be great! Do you need help for this? I think it's pretty orthogonal based on how they designed but if we can help we would be happy to

I have a few technical questions to ensure a smooth integration:

Cache Initialization:

Is there a preferred location or module within the LaVague project where you recommend initializing GPTCache? I was considering adding it in lavague/core/world_model.py. API Modifications:

Are there specific API calls within LaVague that you suggest prioritizing for caching to optimize performance effectively?

Integration:

Should the cache be integrated globally across all modules that use OpenAI/Gemini APIs, or are there particular sections/modules that would benefit most from this caching? Testing: Are there any specific tests or benchmarks you would like me to run to validate the integration of GPTCache within LaVague?

dhuynh95 commented 3 months ago

Cache initializaiton

I guess we could do it at the agent level, where we could have an option for using caching. We also have contexts to package some configurations. You can find the OpenAI context here.

API Modifications

I think OpenAI calls are the ones to prioritize because their models are the best so far for our use cases of web navigation and actions. There does not seem to be a need to change our codebase API calls as it directly works on OpenAI.

Integration

I don't have strong opinions yet on where to integrate it (agent or context), so maybe we can just try and see what makes most sense.

Testing I guess you could run the quick tour twice, with and without caching, and see how much time it takes. It would also be interesting to measure the tokens consumption for both.

As we rely on llama_index for LLM calls, I guess we could play with their observability tooling to check how much we call / consume.

What do you think?