huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.67k stars 743 forks source link

Make `USED_PARALLELISM` atomic #1532

Closed nathaniel-daniel closed 3 weeks ago

nathaniel-daniel commented 1 month ago

Fixes #1491

This fixes UB with the USED_PARALLELISM variable, rationale described in #1491. Additionally, static mut is difficult to use correctly, to the point where it is likely to be removed in a future edition. Instead, I have replaced the bool with an AtomicBool, which can be used without unsafe.

This approach is superior to #1492, as using a mutex for a bool is wasteful.

Narsil commented 3 weeks ago

Thanks a lot for this !

Much better indeed than the existing code.

HuggingFaceDocBuilderDev commented 3 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Narsil commented 3 weeks ago

Let's ignore the clippy lints, I'll tackle them in another PR.