mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece
Apache License 2.0
211 stars 47 forks source link

Compiler error (Rust) #10

Closed jklaise closed 10 months ago

jklaise commented 10 months ago

I tried compiling this as part of the MLC runtime build process (step 2.) but encountered this error. Then tried to compile following the example directory in this repo and got the same error. Rust version used is 1.71.1 .

Compiling tokenizers-c v0.1.0 (/home/janis/src/tokenizers-cpp/rust)
error[E0308]: mismatched types
   --> src/lib.rs:93:49
    |
93  |         self.decode_str = self.tokenizer.decode(ids, skip_special_tokens).unwrap();
    |                                          ------ ^^^ expected `&[u32]`, found `Vec<u32>`
    |                                          |
    |                                          arguments to this method are incorrect
    |
    = note: expected reference `&[u32]`
                  found struct `Vec<u32>`
note: method defined here
   --> /home/janis/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.13.4/src/tokenizer/mod.rs:814:12
    |
814 |     pub fn decode(&self, ids: &[u32], skip_special_tokens: bool) -> Result<String> {
    |            ^^^^^^
help: consider borrowing here
    |
93  |         self.decode_str = self.tokenizer.decode(&ids, skip_special_tokens).unwrap();
    |                                                 +

For more information about this error, try `rustc --explain E0308`.
error: could not compile `tokenizers-c` (lib) due to previous error
make[2]: *** [tokenizers/CMakeFiles/tokenizers_c.dir/build.make:71: tokenizers/release/libtokenizers_c.a] Error 101
make[1]: *** [CMakeFiles/Makefile2:218: tokenizers/CMakeFiles/tokenizers_c.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
jmfirth commented 10 months ago

Just ran into this exact issue while trying to build the Tokenizer runtime dependency for the iOS app in https://mlc.ai/mlc-llm/docs/deploy/ios.html

tqchen commented 10 months ago

Thanks for reporting, seems due to latest update in tokenizers, https://github.com/mlc-ai/tokenizers-cpp/pull/11 should fix it

tqchen commented 10 months ago

fixed by https://github.com/mlc-ai/tokenizers-cpp/pull/11