unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
893 stars 115 forks source link

Error on PIP install #94

Closed victorbuttner closed 5 months ago

victorbuttner commented 9 months ago

Hi guys,

I'm having a issue when I'm trying to install detoxify on my server. It was working before October 3th but now it don't install again, any suggestion of what it could be ? bellow the log error:

0 73.28 Compiling tokenizers v0.12.1 (/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/tokenizers-lib)

0 73.28 Running /usr/local/rustup/toolchains/stable-aarch64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=2b3bdf60b3a9bd17 -C extra-filename=-2b3bdf60b3a9bd17 --out-dir /tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps -L dependency=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps --extern aho_corasick=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libaho_corasick-1161961579a7fbaf.rmeta --extern cached_path=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libcached_path-392d8189f99ab956.rmeta --extern clap=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libclap-ec5b6b800da37152.rmeta --extern derive_builder=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libderive_builder-52cb2ce693bcc6ff.so --extern dirs=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libdirs-c859ef8eb79fd52b.rmeta --extern esaxx_rs=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libesaxx_rs-6d487be5b79dad03.rmeta --extern indicatif=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libindicatif-4a0894c26dcdb80e.rmeta --extern itertools=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libitertools-45db50f37482ae6b.rmeta --extern lazy_static=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/liblazy_static-da3a3fc70106d81c.rmeta --extern log=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/liblog-7676fcd41cf23045.rmeta --extern macro_rules_attribute=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libmacro_rules_attribute-ca527a909e587baf.rmeta --extern onig=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libonig-fe5be2abd9e7e669.rmeta --extern paste=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libpaste-f3bb9e79856c2bc9.so --extern rand=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librand-cbb21673889d40f2.rmeta --extern rayon=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librayon-cb3de0d70d6c012c.rmeta --extern rayon_cond=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librayon_cond-62eaf04d84da7477.rmeta --extern regex=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libregex-79d5fa7fff537334.rmeta --extern regex_syntax=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libregex_syntax-a4ff2a33a9b4bdd0.rmeta --extern reqwest=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libreqwest-44a99af7f47ea828.rmeta --extern serde=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libserde-c523f28c45eee2d1.rmeta --extern serde_json=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libserde_json-2525b9957620b74c.rmeta --extern spm_precompiled=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libspm_precompiled-96f54e5b623964f5.rmeta --extern thiserror=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libthiserror-1e9c82ea9c5dc027.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_normalization_alignments-fc056c1e05a2a888.rmeta --extern unicode_segmentation=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_segmentation-c500d969b247c3a7.rmeta --extern unicode_categories=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_categories-b0206357d88079eb.rmeta -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/bzip2-sys-d36c5a59d1012dd9/out/lib -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/zstd-sys-f5848455acb2dc56/out -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/esaxx-rs-0d23021379df770c/out -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/onig_sys-267649a52142423e/out

0 73.28 warning: variable does not need to be mutable

0 73.28 --> tokenizers-lib/src/models/unigram/model.rs:265:21

0 73.28 |

0 73.28 265 | let mut target_node = &mut best_path_ends_at[key_pos];

0 73.28 | ----^^^^^^^^^^^

0 73.28 | |

0 73.28 | help: remove this mut

0 73.28 |

0 73.28 = note: #[warn(unused_mut)] on by default

0 73.28

0 73.28 warning: variable does not need to be mutable

0 73.28 --> tokenizers-lib/src/models/unigram/model.rs:282:21

0 73.28 |

0 73.28 282 | let mut target_node = &mut best_path_ends_at[starts_at + mblen];

0 73.28 | ----^^^^^^^^^^^

0 73.28 | |

0 73.28 | help: remove this mut

0 73.28

0 73.28 warning: variable does not need to be mutable

0 73.28 --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:200:59

0 73.28 |

0 73.28 200 | encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {

0 73.28 | ----^^^^^^^

0 73.28 | |

0 73.28 | help: remove this mut

0 73.28

0 73.28 error: casting &T to &mut T is undefined behavior, even if the reference is unused, consider instead using an UnsafeCell

0 73.28 --> tokenizers-lib/src/models/bpe/trainer.rs:526:47

0 73.28 |

0 73.28 522 | let w = &words[i] as const as *mut ;

0 73.28 | -------------------------------- casting happend here

0 73.28 ...

0 73.28 526 | let word: &mut Word = &mut (*w);

0 73.28 | ^^^^^^^^^

0 73.28 |

0 73.28 = note: #[deny(invalid_reference_casting)] on by default

0 73.28

0 73.28 warning: tokenizers (lib) generated 3 warnings

0 73.28 error: could not compile tokenizers (lib) due to previous error; 3 warnings emitted

0 73.28

0 73.28 Caused by:

0 73.28 process didn't exit successfully: /usr/local/rustup/toolchains/stable-aarch64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=2b3bdf60b3a9bd17 -C extra-filename=-2b3bdf60b3a9bd17 --out-dir /tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps -L dependency=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps --extern aho_corasick=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libaho_corasick-1161961579a7fbaf.rmeta --extern cached_path=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libcached_path-392d8189f99ab956.rmeta --extern clap=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libclap-ec5b6b800da37152.rmeta --extern derive_builder=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libderive_builder-52cb2ce693bcc6ff.so --extern dirs=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libdirs-c859ef8eb79fd52b.rmeta --extern esaxx_rs=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libesaxx_rs-6d487be5b79dad03.rmeta --extern indicatif=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libindicatif-4a0894c26dcdb80e.rmeta --extern itertools=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libitertools-45db50f37482ae6b.rmeta --extern lazy_static=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/liblazy_static-da3a3fc70106d81c.rmeta --extern log=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/liblog-7676fcd41cf23045.rmeta --extern macro_rules_attribute=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libmacro_rules_attribute-ca527a909e587baf.rmeta --extern onig=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libonig-fe5be2abd9e7e669.rmeta --extern paste=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libpaste-f3bb9e79856c2bc9.so --extern rand=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librand-cbb21673889d40f2.rmeta --extern rayon=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librayon-cb3de0d70d6c012c.rmeta --extern rayon_cond=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/librayon_cond-62eaf04d84da7477.rmeta --extern regex=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libregex-79d5fa7fff537334.rmeta --extern regex_syntax=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libregex_syntax-a4ff2a33a9b4bdd0.rmeta --extern reqwest=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libreqwest-44a99af7f47ea828.rmeta --extern serde=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libserde-c523f28c45eee2d1.rmeta --extern serde_json=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libserde_json-2525b9957620b74c.rmeta --extern spm_precompiled=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libspm_precompiled-96f54e5b623964f5.rmeta --extern thiserror=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libthiserror-1e9c82ea9c5dc027.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_normalization_alignments-fc056c1e05a2a888.rmeta --extern unicode_segmentation=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_segmentation-c500d969b247c3a7.rmeta --extern unicode_categories=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/deps/libunicode_categories-b0206357d88079eb.rmeta -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/bzip2-sys-d36c5a59d1012dd9/out/lib -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/zstd-sys-f5848455acb2dc56/out -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/esaxx-rs-0d23021379df770c/out -L native=/tmp/pip-install-9bndnzr_/tokenizers_c1ebe93b503644d592e7f31c1c5f0cea/target/release/build/onig_sys-267649a52142423e/out (exit status: 1)

0 73.28 error: cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib -- failed with code 101

0 73.28 [end of output]

0 73.28

0 73.28 note: This error originates from a subprocess, and is likely not a problem with pip.

0 73.28 ERROR: Failed building wheel for tokenizers

0 73.29 Building wheel for future (setup.py): started

0 73.96 Building wheel for future (setup.py): finished with status 'done'

0 73.96 Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492022 sha256=6b67d3bea87249126f82cf6102d5ff1989090406b4640cbd40ae4d57d7328cfc

0 73.96 Stored in directory: /tmp/pip-ephem-wheel-cache-rpaixv_x/wheels/da/19/ca/9d8c44cd311a955509d7e13da3f0bea42400c469ef825b580b

0 73.97 Successfully built whisper simplejson future

0 73.97 Failed to build tokenizers

0 73.97 ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

0 74.23

0 74.23 [notice] A new release of pip is available: 23.1.2 -> 23.2.1

0 74.23 [notice] To update, run: pip install --upgrade pip

dcferreira commented 8 months ago

I couldn't reproduce this error, could you give more details? Could you please specify which operating system, python version, and maybe the output of pip freeze if you're using pip?

Toffaa commented 8 months ago

Hi,

I faced the same issue, that is due to the older version (0.12.1) of tokenizers used by detoxify, which does not have built version for Python 3.11 or higher, and seems to have compilation issue.

The issue seems to be fixed in higher version of tokenizers according to this

As a workaround, I downgraded my conda env to Python 3.10.

Maybe the dependancies of Detoxify should be updated ?

atrifat commented 8 months ago

Confirmed, what @Toffaa said is true. @victorbuttner , you can downgrade or use Python 3.9 or Python 3.10 in your virtual environment as workaround.

dcferreira commented 8 months ago

I don't get this error on MacOS, but do get it in WSL (ubuntu in windows).

I noticed that another workaround is to install from master (then I got tokenizers==0.13.3) with pip install git+https://github.com/unitaryai/detoxify.

DanielDaCosta commented 7 months ago

I encountered a similar error on macOS Monterey Version 12.5: ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

@dcferreira's solution of installing from master also worked: pip install git+https://github.com/unitaryai/detoxify

laurahanu commented 5 months ago

Updated pypi version now as well so this should be fixed!