tryAGI / Tiktoken

This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.
https://github.com/openai/tiktoken
MIT License
48 stars 2 forks source link

Token counting for a whole message #31

Open Xan-Kun opened 3 months ago

Xan-Kun commented 3 months ago

Not sure if this would fit into the scope of this project, but could be a real killer feature, since none of the others do it. If not, please feel free to delete :-)

What would you like to be added:

Be able to pass a whole OpenAI Message object into a function, and get the complete token count back.

Why is this needed:

So far, counting of a complete OpenAI message is quite tricky, as the message can include multiple parts now, functions, tools etc. As far as I know, there is no C# lib that supports this, doesn't seem like MS is adding any value here (in contrary :-) ) and it seems everyone wants to count tokens for messages, not just text.

Anything else we need to know?

I tried to implement it following this https://stackoverflow.com/a/77175648/4821032 There is also a typescript library that seems to come very close: https://github.com/hmarr/openai-chat-tokens

P.S.: I think it only is really needed for outgoing (prompt) messages, since the incoming chat objects have the actual token count in them.

HavenDV commented 3 months ago

I'm not sure if this should be part of this library, but maybe in https://github.com/tryAGI/OpenAI? But this may not be an option if you already heavily depend on another OpenAI sdk. My idea is to have a client that is completely generated from the OpenAPI specification (with some additional extensions/constructors for convenience) to provide support for new features on the day they are released For this, I'm putting effort into developing https://github.com/HavenDV/OpenApiGenerator because Kiota/NSwag couldn't handle it, at least when I started it.

Although this sounds quite ambitious, I'm actually making pretty good progress on this. This will also allo to get the same for any other SDK based on the OpenAPI specification, which is very important for the rapid development of a library with a large number of integrations (LangChain .NET)

Xan-Kun commented 3 months ago

I see. Since OpenAI doesn't really give us the specs, esp. not in a machine friendly way, that really shouldn't go in here. IMHO there are a few topics that are heavily unclear how to automate properly: message token counting, price estimation and context window size. Would be really nice if OpenAI could give us an API endpoint for those (and including it in the OpenAI OpenAPI [sic] spec) :-).

Xan-Kun commented 3 months ago

btw, I added your library to this highly viewed SO answer ;-) https://stackoverflow.com/a/75804651/4821032