huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.93k stars 779 forks source link

unable to install on python 3.12 via pip #1393

Closed binary-husky closed 8 months ago

binary-husky commented 10 months ago

Reproducing:

  1. conda create -n py12 python=3.12
  2. conda activate py12
  3. pip install tokenizers

    
    pip install tokenizers
    Collecting tokenizers
    Using cached tokenizers-0.15.0.tar.gz (318 kB)
    Installing build dependencies ... done
    Getting requirements to build wheel ... done
    Preparing metadata (pyproject.toml) ... error
    error: subprocess-exited-with-error
    
    × Preparing metadata (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [6 lines of output]
    
      Cargo, the Rust package manager, is not installed or is not on PATH.
      This package requires Rust and Cargo to compile extensions. Install it through
      the system's package manager or via https://rustup.rs/
    
      Checking for Rust toolchain....
      [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

ArthurZucker commented 10 months ago

You need to have cargo installed because we did not push wheels for mac with py>3.10. @Narsil should we try go up to 3.11 / 3.12 or are we limited by the runners ?

binary-husky commented 10 months ago

You need to have cargo installed because we did not push wheels for mac with py>3.10. @Narsil should we try go up to 3.11 / 3.12 or are we limited by the runners ?

No, I using x86 windows os, not mac. Additionally, I still cannot install tokenizers after pip install cargo. (In python 3.11, everything is fine even without cargo)

ArthurZucker commented 10 months ago

Sorry for the confusion. You have to install rust, and probably not through pip. We compiled the wheels for python up to 3.11 not above

Narsil commented 10 months ago

@ArthurZucker Would you be up to do a patch release with adding 3.12 support ?

ArthurZucker commented 10 months ago

Sure, I'll work on this this week 😉

charlesdsmith commented 10 months ago

Hey @ArthurZucker would the lack of wheel compilation also be the reason for the following error?: /lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: ELF load command past end of file I'm on python 3.11.4.

I can't get chromadb to import tokenizers, it says it doesnt exist even though it does. I'm not sure if I should start a separate issue.

Narsil commented 10 months ago

ELF load command past end of file

This means corrupted file, probably unfinished download or something along those lines, delete the folder a install again I think.

ArthurZucker commented 10 months ago

See #1406 that will try to release for windows x python 3.12

ArthurZucker commented 8 months ago

Christmas delays planning a release for tomorrow 🤗

t5k6 commented 8 months ago

Another option is to install tokenizers through conda: conda install conda-forge::tokenizers