When you send a request with Prompt Caching enabled:
The system checks if the prompt prefix is already cached from a recent query.
If found, it uses the cached version, reducing processing time and costs.
Otherwise, it processes the full prompt and caches the prefix for future use.
Cache write tokens are 25% more expensive than base input tokens
Cache read tokens are 90% cheaper than base input tokens
Regular input and output tokens are priced at standard rates
This is especially useful for:
Prompts with many examples
Large amounts of context or background information
Repetitive tasks with consistent instructions
Long multi-turn conversations
Seems like it can be super useful for large codebases where only a small section needs to be updated each time. Can basically send it all over and it'll be cached each time.
Why?
Seems like it can be super useful for large codebases where only a small section needs to be updated each time. Can basically send it all over and it'll be cached each time.
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#how-prompt-caching-works