huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
9.05k stars 799 forks source link

prebuilt darwin arm64 wheels #1026

Closed tekumara closed 8 months ago

tekumara commented 2 years ago
$ pip install tokenizers
Collecting tokenizers
  Downloading tokenizers-0.12.1.tar.gz (220 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 220.7/220.7 kB 4.1 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: tokenizers
  Building wheel for tokenizers (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [51 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-12.4-arm64-cpython-39
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers
      copying py_src/tokenizers/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/models
      copying py_src/tokenizers/models/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/models
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/decoders
      copying py_src/tokenizers/decoders/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/decoders
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/normalizers
      copying py_src/tokenizers/normalizers/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/normalizers
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/pre_tokenizers
      copying py_src/tokenizers/pre_tokenizers/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/pre_tokenizers
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/processors
      copying py_src/tokenizers/processors/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/processors
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/trainers
      copying py_src/tokenizers/trainers/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/trainers
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/implementations
      creating build/lib.macosx-12.4-arm64-cpython-39/tokenizers/tools
      copying py_src/tokenizers/tools/__init__.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/tools
      copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/tools
      copying py_src/tokenizers/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers
      copying py_src/tokenizers/models/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/models
      copying py_src/tokenizers/decoders/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/decoders
      copying py_src/tokenizers/normalizers/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/normalizers
      copying py_src/tokenizers/pre_tokenizers/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/pre_tokenizers
      copying py_src/tokenizers/processors/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/processors
      copying py_src/tokenizers/trainers/__init__.pyi -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/trainers
      copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12.4-arm64-cpython-39/tokenizers/tools
      running build_ext
      running build_rust
      error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

python 3.9 on darwin arm64 pip 22.1.2

Would it be possible to provide prebuilt darwin arm64 wheels?

Narsil commented 2 years ago

Hi @tekumara ,

Unfortunately GH actions don't support running darwin arm yet (afaik). So all those prebuilt versions are done manually.

If you know how to do prebuilt binaries automatically on GH we're welcoming patches. If you have a rust compiler, then your install should work.

julien-c commented 2 years ago

@Narsil we could use a self-hosted GH Action runner, they're very easy to deploy and setup (and we have some mac mini machines in the data center, cc @mfuntowicz)

McPatate commented 2 years ago

@julien-c happy to take care of this

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

tekumara commented 9 months ago

GitHub actions now offers free m1 runners

https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/