Closed baptisterajaut closed 7 months ago
Tokenizers cannot be installed for me too. It is being installed as part of the Allen-NLP package and the new version of the Rust compiler breaks it.
Installing Rust via the Rust site using their shell script installs 1.73.0 I presume and breaks the Tokenizers compilation, but installing it via Homebrew installs 1.72.1, which is works.
Which version are you using.
This was fixed already on main and 0.14.1
https://github.com/huggingface/tokenizers/blob/main/tokenizers/src/models/bpe/trainer.rs#L541-L546
To escape from this error, I install transformers with conda, which uses command 'conda install -c huggingface transformers'. then it works.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
I have the same problem with Python 3.11 do you need more information about this issue?
@DavidAdamczyk Use a more recent tokenizers version, or an older Rust compiler version.
I use the latest version of tokenizers and the most recent stable version of the Rust compiler. Additionally, I follow the installation instructions available here. Could someone update the installation instructions and include information about the supported versions of all dependencies?
Hey Hi, This same error has happened with me I am trying to install transformers v 4.6.1 on Pyng z2 board (v2.5 {arm7l}) with rust v 1.74.1
Edit: Strategy to solve this error is to use older rust version -> (What I did)
1) install rust v1.72.1
rustup default 1.72.1
2) Remove rust stable or set environment variable to make sure that compilation does not use rust stable
rustup toolchain remove stable
or
export RUSTUP_TOOLCHAIN=1.72.1
After this It should work properly
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
pip3 install transformers==4.15.0 timm==0.4.12 fairscale==0.4.4
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib\src\models\bpe\trainer.rs:517:47
|
513 | let w = &words[*i] as *const _ as *mut _;
| -------------------------------- casting happend here
...
517 | let word: &mut Word = &mut (*w);
| ^^^^^^^^^
|
= note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
= note: `#[deny(invalid_reference_casting)]` on by default
running into this tonight too.
Requirement already satisfied: requests in c:\users\dhorner\anaconda3\envs\hotz\lib\site-packages (from transformers==4.15.0->-r requirements.txt (line 2)) (2.31.0) Collecting sacremoses (from transformers==4.15.0->-r requirements.txt (line 2)) Using cached sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB) Collecting tokenizers<0.11,>=0.10.1 (from transformers==4.15.0->-r requirements.txt (line 2)) Using cached tokenizers-0.10.3.tar.gz (212 kB)
THE SOLUTION FOR ME WAS TO SET RUSTFLAGS=-A invalid_reference_casting worked for me in 1.75.0
Also ran in to this issue last week, installing transformers==4.22.1 pinned by a different project. tokenizers
resolved to v0.12.1. Platform was macOS Sonoma, M2 chip.
I also worked around by running:
export RUSTFLAGS="-A invalid_reference_casting"
...before installing, but it'd be great if the problem could be tackled at source!
I would love to be the one to help resolve this further than a environment flag.
tokenizers-lib/src/models/bpe/trainer.rs:526
I do not see tokenizers-lib in tree.
rg "let w = &words[*i] as *const _ as *mut _;"
finds nothing
The error guidance is not clear. GPT says: This error message indicates that you're attempting to cast a shared reference (&T) into a mutable reference (&mut T), which is considered undefined behavior in Rust, even if the mutable reference is not actually used. Rust's safety guarantees rely on preventing such unsound operations.
To resolve this issue, you should use appropriate safe patterns for mutable access, such as Cell, RefCell, or UnsafeCell for interior mutability, depending on your specific use case.
In your case, since you're dealing with mutable access to data through raw pointers, you should consider using UnsafeCell. Here's how you can adjust your code:
use std::cell::UnsafeCell;
// Assuming Word is some struct or type you're working with
struct Word {
// fields of Word
}
// Assuming words is some collection of Word
let words: Vec<Word> = /* initialization of words */;
// Assuming i is some index into the words vector
let i = /* index */;
// Accessing the word at index i in a mutable way
let w = &words[i] as *const _ as *mut UnsafeCell<Word>;
let word: &UnsafeCell<Word> = unsafe { &*w };
let word_mut: &mut Word = unsafe { &mut *word.get() };
However, using UnsafeCell requires careful handling as it bypasses Rust's safety checks. Make sure you understand the implications of using UnsafeCell and ensure that your code is correct and safe.
so Rustonomicon.
If someone can orient me to where the code is. I don't know where it lives.
I'll close this as the latest releases don't have this issue anymore I believe
As stated, this commit breaks building the tokenizers on modern toolchains, even stable
% rustc -V rustc 1.73.0 (cc66ad468 2023-10-03)