jxnl / instructor

structured outputs for llms
https://python.useinstructor.com/
MIT License
7.67k stars 608 forks source link

Support Usage Tokens Output in Claude API #667

Open exa256 opened 4 months ago

exa256 commented 4 months ago

Is your feature request related to a problem? Please describe. Currently, reporting usage dictionary from OpenAI API is supported as seen in this document and usage dictionary. https://python.useinstructor.com/concepts/usage/?h=token+usage

However, Claude API patch does not have this functionality, even though usage is available from a successful 200 response from Anthropic's server: 200 Response from https://docs.anthropic.com/en/api/messages

{
  "content": [
    {
      "text": "Hi! My name is Claude.",
      "type": "text"
    }
  ],
  "id": "msg_013Zva2CMHLNnXjNJJKqJ2EF",
  "model": "claude-3-opus-20240229",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 25
  }
}

Describe the solution you'd like Instructor should patch Claude's API and surface the usage dictionary as part of the output in the second tuple like so:

structure_output, completion = client.chat.completions.create_with_completion(...)
completion.usage # should returns usage, consists of input and output tokens
Elijas commented 4 months ago

Use Anthropic Claude through LiteLLM, the usage and cost gets reported

import instructor
from litellm import completion
from litellm import completion, completion_cost, cost_per_token
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_litellm(completion)

resp, completion = client.chat.completions.create_with_completion(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

usage = completion.usage
input_tokens = usage.prompt_tokens
output_tokens = usage.completion_tokens
total_tokens = usage.total_tokens
    input_cost_usd, output_cost_usd = cost_per_token(model, prompt_tokens=input_tokens, completion_tokens=output_tokens)
    completion_cost_usd = completion_cost(completion_response=raw_result)
ssonal commented 4 months ago

Describe the solution you'd like Instructor should patch Claude's API and surface the usage dictionary as part of the output in the second tuple like so:

structured_output._raw_response.usage works but doesn't take retries into account.

@jxnl maybe we attach cumulative usage data here? It's currently getting lost while processing response. https://github.com/jxnl/instructor/blob/081418d59a397b38a1b66fe58a64ef94f9124a6b/instructor/process_response.py#L97-L100

Elijas commented 4 months ago

Describe the solution you'd like Instructor should patch Claude's API and surface the usage dictionary as part of the output in the second tuple like so:

structured_output._raw_response.usage works but doesn't take retries into account.

@jxnl maybe we attach cumulative usage data here? It's currently getting lost while processing response.

https://github.com/jxnl/instructor/blob/081418d59a397b38a1b66fe58a64ef94f9124a6b/instructor/process_response.py#L97-L100

related https://github.com/jxnl/instructor/issues/715

pradeepdas commented 3 weeks ago

usage or other completion param doesn't work for Iterables

'list' object has no attribute '_raw_response'