Closed windy31 closed 3 weeks ago
Those numbers look normal! With prompt caching the majority of input tokens will be from cache reads, since every request is hitting the cache for the entire conversation history over and over again. Here's a video I posted a while ago explaining it a bit more: https://x.com/sdrzn/status/1824054511524016514
Here's an example of me working with a project:
It seems like every time i do anything, app adds everything to the cache and don't use tokens. I'm not sure if thats a correct way of working with cache, so i might be not doing something correctly. Is there a way to specify what to take into cache, or is it normal as it is?
And also, it seems like the caching hits the limit aswell(1 mil tokens) faster, but i might be mistaken.