huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.93k stars 779 forks source link

Allow `huggingface_hub<1.0` #1385

Closed Wauplin closed 10 months ago

Wauplin commented 11 months ago

Follow-up PR after https://github.com/huggingface/tokenizers/pull/1383 and especially https://github.com/huggingface/tokenizers/pull/1383#issuecomment-1794980028. I still believe that we should not block huggingface_hub from introducing breaking changes. However for the most used and less prone to change APIs (like the upload/download ones), I'm more than fine to consider them as "fixed until at least v1.0".

I've looked into tokenizers implementation and given that it uses only hf_hub_download, let's set huggingface_hub<1.0 as dependency. This will reduce the friction for users wanting to install the latest huggingface_hub version. If at any point tokenizers starts to use other methods, let's reassess and chose a solution described in https://github.com/huggingface/tokenizers/pull/1383#issuecomment-1794980028. In the meantime, I think we should merge this PR.

(also related: once merged and released you should be able to delete https://github.com/huggingface/tokenizers/pull/1377 @clefourrier)

HuggingFaceDocBuilderDev commented 11 months ago

The documentation is not available anymore as the PR was closed or merged.

TheBloke commented 10 months ago

Oh thank god! 😁

 [pytorch2] tomj@MC:/workspace/process ᐅ history | grep huggingface-hub
 8761  3.10.2023 13:29  pip3 install huggingface-hub'>=0.17'
 8954  13.10.2023 08:07  pip3 freeze | grep huggingface-hub
 9189  22.10.2023 10:47  pip3 install -U huggingface-hub
 9349  27.10.2023 11:20  pip3 install huggingface-hub'>=0.18'
 9628  29.10.2023 17:14  pip3 install huggingface-hub'>=0.18.0'
10045* 2.11.2023 17:17  pip3 install huggingface-hub'>=0.18.0'
10054* 2.11.2023 17:33  pip3 install huggingface-hub'>=0.18.0'
10359* 3.11.2023 18:30  pip3 install huggingface-hub'>=0.18.0'
10666* 5.11.2023 02:22  pip3 install huggingface-hub'>=0.18.0'
11513* 8.11.2023 16:54  pip3 install --upgrade huggingface-hub
11845* 10.11.2023 08:42  pip3 install --upgrade huggingface-hub
11847* 10.11.2023 08:46  pip3 install --upgrade huggingface-hub
11916* 10.11.2023 11:52  pip3 install --upgrade huggingface-hub

Thank you!

clefourrier commented 10 months ago

@ArthurZucker Atm, transformers depends on tokenizers<0.15 and >=0.14 - do you know for which version it will change?

Wauplin commented 10 months ago

@clefourrier this PR has been merged this morning to unpin tokenizers: https://github.com/huggingface/transformers/pull/27494. I guess starting from next release then :)

clefourrier commented 10 months ago

Missed it, ty!

ArthurZucker commented 10 months ago

I'll do a patch today to support this version as well 😉