huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.68k stars 746 forks source link

"make bench" command does not download all required resources #1425

Closed zamazan4ik closed 5 months ago

zamazan4ik commented 6 months ago

Hi!

I checked out the repo on the main branch with f1c23b868006ee27acdd31796677f82fa10d6bd7 commit, cd-ed to tokenizers directory, and ran make bench command. Instead of successful benchmark run, I got the following error:

     Running benches/layout_benchmark.rs (target/release/deps/layout_benchmark-1f68f2f9387775b7)
thread 'main' panicked at benches/layout_benchmark.rs:28:80:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `--bench layout_benchmark`

The reason is that make bench command does not download Albert test resource for layout_benchmark referenced here.

It could be fixed locally by running make test before make bench - it downloads all required files. But I think it should be fixed in another way - make bench command should be isolated and download all required test files.