xebia-functional / xef

Building applications with LLMs through composability, in Kotlin, Scala, ...
https://xef.ai
Apache License 2.0
178 stars 15 forks source link

Tokenizer: Support GPT4o o200k encoding #748

Closed realdavidvega closed 5 months ago

realdavidvega commented 5 months ago

This PR brings the GPT4o o200k encoding for the KMP Tokenizer.

Test cases were generated using a small Python script and tiktoken library.