Open kallebysantos opened 2 months ago
I'm seeing the same error with Python when trying to read the tokenizer from Xenova/speecht5_tts.
wget https://huggingface.co/Xenova/speecht5_tts/resolve/main/tokenizer.json
from tokenizers import Tokenizer
Tokenizer.from_file("tokenizer.json")
thread '<unnamed>' panicked at /Users/runner/work/tokenizers/tokenizers/tokenizers/src/normalizers/mod.rs:143:26:
Precompiled: Error("invalid type: null, expected a borrowed string", line: 1, column: 28)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
...
pyo3_runtime.PanicException: Precompiled: Error("invalid type: null, expected a borrowed string", line: 1, column: 28)
With Tokenizers 0.19.0, this raised an error which could be handled rather than a panic. It looks like this may be related to #1604.
I'm also facing the same issue (#1645) with speecht5_tts.
I think passing a ""
might work. cc @xenova not sure why you end up with nulls
there, but we can probably syn and I can add support for option!
I think passing a
""
might work. cc @xenova not sure why you end up withnulls
there, but we can probably syn and I can add support for option!
Xenova implementation doesn't call the value directly but applies iterators over config normalizers. I think that it ignores the null values.
I agree with you, add support for Option<>
may solve it.
I've implemented spm_precompiled with null support at vicantwin/spm_precompiled, which includes a test with null support, and all tests pass successfully.
But, I need some help with changing this repository, as I'm not entirely familiar with this codebase and unsure how to implement the necessary changes. Any help would be greatly appreciated.
Hi guys, I'm currently working on https://github.com/supabase/edge-runtime/pull/368 that pretends to add a rust implementation of
pipeline()
.While I was coding the
translation
task I figured out that I can't load theTokenizer
instance for Xenova/opus-mt-en-fronnx
model and their otheropus-mt-*
variants.I got the following:
```rust let tokenizer_path = Path::new("opus-mt-en-fr/tokenizer.json"); let tokenizer = Tokenizer::from_file(tokenizer_path).unwrap(); ``` ``` thread 'main' panicked at /home/kalleby/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.20.0/src/normalizers/mod.rs:143:26: Precompiled: Error("invalid type: null, expected a borrowed string", line: 1, column: 28) stack backtrace: 0: rust_begin_unwind at /rustc/80eb5a8e910e5185d47cdefe3732d839c78a5e7e/library/std/src/panicking.rs:662:5 1: core::panicking::panic_fmt at /rustc/80eb5a8e910e5185d47cdefe3732d839c78a5e7e/library/core/src/panicking.rs:74:14 2: core::result::unwrap_failed at /rustc/80eb5a8e910e5185d47cdefe3732d839c78a5e7e/library/core/src/result.rs:1679:5 3: core::result::ResultI now that it occurs because their
tokenizer.json
file was the following:opus-mt-en-fr:
While the expected behavior must be something like this:
nllb-200-distilled-600M:
Looking in the original version of Helsinki-NLP/opus-mt-en-fr I notice that is no
tokenizer.json
file for it.I would like to know if is the
precompiled_charsmap
necessary expect a non-null?Is there some workaround to execute theses models without change the internal model files? How can I handle an exported
onnx
model that doesn't have thetokenizer.json
file?