huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
9.08k stars 809 forks source link

Cannot compile tokenizers on PowerPC 9 while installing transformers #604

Closed kmeng01 closed 3 years ago

kmeng01 commented 3 years ago

Environment info

Who can help

@mfuntowicz @n1t0

Information

I am trying to install transformers==3.4.0 on an PowerPC 9 system. It's an IBM compute rig for use by MIT.

To reproduce

Steps to reproduce the behavior:

  1. Create new conda environment with python 3.7
  2. Run pip install transformers==3.4.0 (the version that I need)
Compiling tokenizers v0.10.1 (/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/tokenizers-lib)
       Running `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -Cembed-bitcode=no -C metadata=204c4d103d08e9e3 -C extra-filename=-204c4d103d08e9e3 --out-dir /tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps -L dependency=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps --extern clap=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libclap-b8e428690762cf7e.rmeta --extern derive_builder=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libderive_builder-247f4f57ff4bf4c7.so --extern esaxx_rs=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libesaxx_rs-28ce6f8a8d31c937.rmeta --extern indicatif=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libindicatif-280a1d33f346e384.rmeta --extern itertools=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libitertools-759131012594af62.rmeta --extern lazy_static=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/liblazy_static-0f749853bc34e9e0.rmeta --extern log=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/liblog-12a018fba7f0b36d.rmeta --extern onig=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libonig-3ca2736cdef653d2.rmeta --extern rand=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librand-52622a6339ec540d.rmeta --extern rayon=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librayon-f4508233e0c77565.rmeta --extern rayon_cond=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librayon_cond-d89d0c7f0a1d1a11.rmeta --extern regex=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libregex-dbb55ca763c16a0e.rmeta --extern regex_syntax=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libregex_syntax-c7a8a1f28fe982ac.rmeta --extern serde=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libserde-11e7f5f85ab52b72.rmeta --extern serde_json=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libserde_json-477c52136da5fafe.rmeta --extern spm_precompiled=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libspm_precompiled-39a90f21c16965ef.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_normalization_alignments-157a660dec7f1476.rmeta --extern unicode_segmentation=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_segmentation-66856f91381ae1a4.rmeta --extern unicode_categories=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_categories-209e6f430e5d88d1.rmeta -L native=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/build/esaxx-rs-62ba703c44f19ac6/out -L native=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/build/onig_sys-091ecfe4b66243c7/out`
  error[E0603]: module `export` is private
     --> tokenizers-lib/src/tokenizer/mod.rs:24:12
      |
  24  | use serde::export::Formatter;
      |            ^^^^^^ private module
      |
  note: the module `export` is defined here
     --> /home/mengk/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.119/src/lib.rs:275:5
      |
  275 | use self::__private as export;
      |     ^^^^^^^^^^^^^^^^^^^^^^^^^

  error: aborting due to previous error

  For more information about this error, try `rustc --explain E0603`.
  error: could not compile `tokenizers`.

  Caused by:
    process didn't exit successfully: `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -Cembed-bitcode=no -C metadata=204c4d103d08e9e3 -C extra-filename=-204c4d103d08e9e3 --out-dir /tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps -L dependency=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps --extern clap=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libclap-b8e428690762cf7e.rmeta --extern derive_builder=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libderive_builder-247f4f57ff4bf4c7.so --extern esaxx_rs=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libesaxx_rs-28ce6f8a8d31c937.rmeta --extern indicatif=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libindicatif-280a1d33f346e384.rmeta --extern itertools=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libitertools-759131012594af62.rmeta --extern lazy_static=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/liblazy_static-0f749853bc34e9e0.rmeta --extern log=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/liblog-12a018fba7f0b36d.rmeta --extern onig=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libonig-3ca2736cdef653d2.rmeta --extern rand=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librand-52622a6339ec540d.rmeta --extern rayon=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librayon-f4508233e0c77565.rmeta --extern rayon_cond=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/librayon_cond-d89d0c7f0a1d1a11.rmeta --extern regex=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libregex-dbb55ca763c16a0e.rmeta --extern regex_syntax=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libregex_syntax-c7a8a1f28fe982ac.rmeta --extern serde=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libserde-11e7f5f85ab52b72.rmeta --extern serde_json=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libserde_json-477c52136da5fafe.rmeta --extern spm_precompiled=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libspm_precompiled-39a90f21c16965ef.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_normalization_alignments-157a660dec7f1476.rmeta --extern unicode_segmentation=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_segmentation-66856f91381ae1a4.rmeta --extern unicode_categories=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/deps/libunicode_categories-209e6f430e5d88d1.rmeta -L native=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/build/esaxx-rs-62ba703c44f19ac6/out -L native=/tmp/pip-install-4jlvfd19/tokenizers_3c1b3bbe26064417aa8614a7fb564203/target/release/build/onig_sys-091ecfe4b66243c7/out` (exit code: 1)
  cargo rustc --lib --manifest-path Cargo.toml --features pyo3/extension-module --release --verbose -- --crate-type cdylib
  error: cargo failed with code: 101

  ----------------------------------------
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

Expected behavior

There shouldn't be any error messages.

Sidenote: before getting this, I had an error complaining that I didn't have rust installed, but I did so using the command given on the official website.

kmeng01 commented 3 years ago

Seems like this is a problem with transformers that has been updated in a newer release: https://github.com/huggingface/transformers/issues/9668

DanBrown47 commented 6 months ago

@kmeng01 Getting the same error, how did you resolve the same ?

ArthurZucker commented 6 months ago

tokenizers v0.10.1 is a super old version, 3.4 is more tan 4 years old.

ArthurZucker commented 6 months ago

Just update whatever you are using!