Tracking Token Usage - Githubissues

guidance-ai / guidance

A guidance language for controlling large language models.

MIT License

18.91k stars 1.04k forks source link

Tracking Token Usage #330

Open TaylorAndStubbs opened 1 year ago

TaylorAndStubbs commented 1 year ago

Is your feature request related to a problem? Please describe. I need to keep track of my various client's token usage. The OpenAI output has completetion_tokens in the following format: “prompt_tokens”: 123, “completion_tokens”: 55, “total_tokens”: 178

And it would be super cool if those could be in the output variables.

Describe the solution you'd like program()["completion_tokens"] = 81 etc.

Describe alternatives you've considered Tokenizing the input and output. Could not find a way to get the raw input and output going to OpenAI to count the tokens.

Additional context

williambrach commented 1 year ago

any updates?

lcp-lchilds commented 1 year ago

I need this as well.

Harsha-Nori commented 1 year ago

Great suggestion! Out of curiosity, if guidance needs to make multiple API calls across a program, would you prefer a granular per-call breakdown or just a simple aggregation of all the calls?

TaylorAndStubbs commented 1 year ago

I would prefer a per-call breakdown since I could always aggregate it later if I needed to. Would asking for both be feasible?

fullstackwebdev commented 1 year ago

I would like to have this feature as well. I tried to see if I could add it myself, but to my surprise in chat complete streaming mode, when I print the results of the openai response, the 'usage' object is missing.

So then I tried a second approach (kind of bad): calculating the tokens using the prompt and final response (probably not what everyone wants, but would work for me) but I was unable to figure out how to return the data in streaming.

Any hints on how to do this?

mjedmonds commented 1 year ago

+1, would love to also be able to get the prompt_tokens prior to executing, so that we can detect how many tokens the completion could have (for error handling)

jscheel commented 1 year ago

100% agree that this would be super useful. As a real-world example, I have written a token wallet that allocates and keeps track of token usage per user. It would be incredibly useful if I had a callback that is triggered before and after each new LLM call. This way I could withdraw the tokens from the token wallet at each step and bail the execution early if the wallet is empty.

MichaelOwenDyer commented 1 year ago

Bump. This would be very useful to have.

NickSmet commented 1 year ago

+1, would be mega-useful!

In the meantime, I've just modified the "OpenAIResponse" class in "openai_response.py" by adding print(data['usage']) at the end 😅

jan-ninja commented 1 year ago

+1 here too, i'm trying to track latency and token counts to compare different prompts guidance programs

avion23 commented 6 months ago

Has there been any progress?

I skimmed pull requests and found nothing.