Open hauntsaninja opened 1 year ago
👋,
I built a port for go that you can find in the link below
I am currently using a another port in Go. https://github.com/pkoukk/tiktoken-go
Hello @hauntsaninja , I was looking at https://github.com/openai/tiktoken/blob/main/src/lib.rs and it appears to be written in Rust. Could this be open sourced into a crate of its own?
See the FAQ https://github.com/openai/tiktoken/issues/98
@hauntsaninja would it be possible to publish the full test suite publicly? That would make it easier to tell whether a given implementation matches (or is close to) the official implementation.
Here's a pure JavaScript / TypeScript port of tiktoken: https://github.com/niieani/gpt-tokenizer Playground online: https://gpt-tokenizer.dev
Here's a pure JavaScript / TypeScript port of tiktoken: https://github.com/niieani/gpt-tokenizer Playground online: https://gpt-tokenizer.dev
Hi,for non-English, such as Chinese token calculations are incorrect
there is openAI Token calculator:
@shylockWu they're not incorrect. You've set gpt-tokenizer to tokenize using GPT-3.5/GPT-4 encoding, whereas the official openAI token calculator uses the older GPT-3. If you switch the playground to use the older model, you'll get the same result.
I have built and published a port for Kotlin: https://github.com/aallam/ktoken :)
The following projects are not maintained by OpenAI. I cannot vouch that any of them are correct or safe to use. Use at your own risk.
Note that if a tokeniser fails to exactly match tiktoken's behaviour, you may get worse results when sampling from models, with no warning.
Javascript
Rust
Java
Ruby
C#
Go
PHP
Kotlin
Thanks to everyone for building useful things!
I'm happy to link to other projects in this comment.