I might be not understanding caching

saoudrizwan / claude-dev

Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, and more with your permission every step of the way.

https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev

MIT License

4.04k stars 381 forks source link

I might be not understanding caching #143

Closed windy31 closed 3 weeks ago

windy31 commented 3 weeks ago

Here's an example of me working with a project:

It seems like every time i do anything, app adds everything to the cache and don't use tokens. I'm not sure if thats a correct way of working with cache, so i might be not doing something correctly. Is there a way to specify what to take into cache, or is it normal as it is?

And also, it seems like the caching hits the limit aswell(1 mil tokens) faster, but i might be mistaken.

saoudrizwan commented 3 weeks ago

Those numbers look normal! With prompt caching the majority of input tokens will be from cache reads, since every request is hitting the cache for the entire conversation history over and over again. Here's a video I posted a while ago explaining it a bit more: https://x.com/sdrzn/status/1824054511524016514