token counting may be done by the API

tischi commented 11 months ago

if we want to compute the price for a request we may not have to count the tokens ourselves, as they seem to be provided by the model response: https://platform.openai.com/docs/guides/text-generation/chat-completions-response-format

haesleinhuepf commented 11 months ago

related issues: #24 #34

ishaan-jaff commented 10 months ago

Hi @haesleinhuepf @tischi I'm the maintainer of LiteLLM https://github.com/BerriAI/litellm we allow you to do cost tracking for 100+ LLMs

Usage

Docs: https://docs.litellm.ai/docs/#calculate-costs-usage-latency

from litellm import completion, completion_cost
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
  model="gpt-3.5-turbo", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

cost = completion_cost(completion_response=response)
print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")

We also allow you to create a self hosted OpenAI Compatible proxy server to make your LLM calls (100+ LLMs), track costs, token usage Docs: https://docs.litellm.ai/docs/simple_proxy

I hope this is helpful, if not I'd love your feedback on what we can improve

haesleinhuepf / bia-bob

token counting may be done by the API #87

Usage