I am using go-openai to request Braintrust AI proxy which allows to access models from OpenAI, Anthropic, Google, AWS, Mistral, and third-party inference providers through a single, unified API (openai).
When requesting anthropic models with prompt caching I need to perform these kind of requests
curl https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: prompt-caching-2024-07-31" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.\n"
},
{
"type": "text",
"text": "<the entire contents of Pride and Prejudice>",
"cache_control": {"type": "ephemeral"}
}
],
"messages": [
{
"role": "user",
"content": "Analyze the major themes in Pride and Prejudice."
}
]
}'
Would you consider adding CacheControl in ChatCompletionMessage for this kind of use case even if it is not part of openai per se?
I am using
go-openai
to request Braintrust AI proxy which allows to access models from OpenAI, Anthropic, Google, AWS, Mistral, and third-party inference providers through a single, unified API (openai).When requesting anthropic models with prompt caching I need to perform these kind of requests
Would you consider adding
CacheControl
inChatCompletionMessage
for this kind of use case even if it is not part of openai per se?