paul-gauthier / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
18.79k stars 1.74k forks source link

Prompt Caching with OpenRouter Sonnet 3.5 #1615

Open dkahale opened 5 hours ago

dkahale commented 5 hours ago

Issue

I'm attempting to get the prompt cache to work with Openrouter using Anthropic's Claude Sonnet 3.5. Im using the following aider start prompt:

aider --model openrouter/anthropic/claude-3.5-sonnet --cache-prompts --cache-keepalive-pings 5 --no-stream

It doesn't seem to cache anything. Ive tried adding to read-only, added to chat, and asked it simple questions about the repo. I still get no information regarding the caching and the token costs appear to be the same even when I check on openrouter credit usage.

Has anyone been able to get the prompt caching to work with Sonnet 3.5 on Openrouter yet?

Version and model info

Aider v0.56.0 Main model: openrouter/anthropic/claude-3.5-sonnet with diff edit format, prompt cache, infinite output Weak model: openrouter/anthropic/claude-3-haiku-20240307 Git repo: .git with 3,794 files Warning: For large repos, consider using --subtree-only and .aiderignore See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo Repo-map: using 1024 tokens, files refresh

fry69 commented 3 hours ago

Thank you for filing this issue.

A few notes on prompt caching:

dkahale commented 2 hours ago

According to the Aider docuemntation it says Aider organizes the chat history to try and cache:

shouldn't this mean the /add files should also be cached?

fry69 commented 1 hour ago

I am not sure how clever aider is about prompt caching, it is only possible to create four different cache blocks. If anything changes in those blocks, the block gets invalidated. So if you only change a single character in the edit files block, the whole block gets stored again (at a 25% premium).

This assumes aider is not trying to store every file in its own block, which will only store the first 4 (probably even less, as blocks get consumed for the system prompt and the repo map too).

In fact, I'd look closely what is actually happening (where those cache_control block markers get set) in the LLM conversation: Try starting aider with the additional argument --llm-history-file <file> and analyze what happened after a while (when you are sure that some prompt caching should have happened)

For more details how prompt caching works see here -> https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching