huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.88k stars 765 forks source link

Support for Golang now or support a cli for other languages? #1601

Open xuxiaoxia96 opened 1 month ago

xuxiaoxia96 commented 1 month ago

Hey team, Thanks for this great library, this helped us to avoid installing the whole transformers library to be able to use the tokenizer! Any plan for Golang binding over the Rust implementation or from scratch?

Also, where one would start to write it from scratch in Golang?

Narsil commented 1 month ago

I think other projects are maintaining their own: https://pkg.go.dev/github.com/gomlx/tokenizers#section-readme

We are currently not going to support due to the low amount of demand (compared to Python)

janpfeifer commented 2 weeks ago

Sorry, the github.com/gomlx/tokenizers is still under construction (and I updated the README.md to reflect that).

I stopped moving forward with it recently because for now I'm working on porting Gemma model to GoMLX, for which I'm now using the recent https://github.com/eliben/go-sentencepiece , maybe it can help you @xuxiaoxia96 ?