huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.93k stars 777 forks source link

Support operating computer system #1457

Closed Southpika closed 6 months ago

Southpika commented 7 months ago

Hello, I want to use this package in my company, so I want to know the supported computer system when using pkg tokenizers, does it have a lowest version requirements in Linux/Mac/Windows? Besides this, I also want to know if I run it in C++ program, what should I pay attention to? Wish your answer, Many thanks.

Narsil commented 6 months ago

Pretty much everywhere on consumer hardware.

For using in C++ you probably want to link directly to the rust side using a C compatibility layer, which we do not provide. (it should be mostly creating a few functions calls: https://docs.rust-embedded.org/book/interoperability/c-with-rust.html

Narsil commented 6 months ago

Please not that tokenizers already uses threads, so do not use threading in your own library, or disable it it using TOKENIZERS_PARALLELISM=0 as an environment variable.