huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
9.04k stars 799 forks source link

Can't install tokenizers 0.9.2 to reproduce VideoCLIP #1062

Closed vineetparikh closed 2 years ago

vineetparikh commented 2 years ago

I'm trying to reproduce VideoCLIP locally, and this package relies on transformers=3.4 (it can't use newer versions due to API-incompatible changes), which itself relies on tokenizers 0.9.2. I can't install tokenizers 0.9.2 via pip due to a set of compilation issues with the provided crate for Rust ndarray which all have the form

      error[E0277]: the trait bound `<S as data_traits::DataOwned>::MaybeUninit: data_traits::RawDataSubst<A>` is not satisfied
         --> /home/vap43/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.6/src/impl_methods.rs:276:12
          |
      276 |         S: DataOwned,
          |            ^^^^^^^^^ the trait `data_traits::RawDataSubst<A>` is not implemented for `<S as data_traits::DataOwned>::MaybeUninit`
          |
         ::: /home/vap43/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.6/src/data_traits.rs:519:18
          |
      519 | pub unsafe trait DataOwned: Data {
          |                  --------- required by a bound in this
      ...
      522 |         + RawDataSubst<Self::Elem, Output=Self>;
          |           ------------------------------------- required by this bound in `data_traits::DataOwned`
          |
      help: consider further restricting the associated type
          |
      276 |         S: DataOwned, <S as data_traits::DataOwned>::MaybeUninit: data_traits::RawDataSubst<A>
          |                     

when I run pip install transformers==3.4 or pip install tokenizers==0.9.2.

It seems as if the rust crate might be incorrect? When I pull ndarray from source and build tag 0.15.6, it seems to build successfully locally. I'm using rustc 1.46, but later versions (including the default stable and nightly) seem to have the same problem. What would you recommend I try to do next?

Narsil commented 2 years ago

Tbh I cannot really help you easily here. This version is quite old.

Did you try pulling the rust sources and running tests directly on them cd tokenizers/tokenizers && make test? Then move on the python bindings bindings/python.

That should help figure out who's the culprit. Also what's your platform ?

vineetparikh commented 2 years ago

^ ah it looks like i could've just used the prebuilt py3.8 wheel instead of trying to get this to work with py3.10: doing that ended up working. thanks for the help!

Narsil commented 2 years ago

Glad you figured it out. And thanks for sharing your solution, I'm sure it'll help readers ! :d