huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
9.09k stars 810 forks source link

README.md contains non-functional code #1633

Open ahenkes1 opened 2 months ago

ahenkes1 commented 2 months ago

The README.md (and the corresponding landing page of the documentation) contains non-functional code. Specifically the following:

Loading a pretrained tokenizer from the Hub

use tokenizers::tokenizer::{Result, Tokenizer};

fn main() -> Result<()> {
    # #[cfg(feature = "http")]
    # {
        let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None)?;

        let encoding = tokenizer.encode("Hey there!", false)?;
        println!("{:?}", encoding.get_tokens());
    # }
    Ok(())
}

Here, the function to load the tokenizer from a pretrained model is not available in 'Tokenizer':

error[E0599]: no function or associated item named `from_pretrained` found for struct `Tokenizer` in the current scope
   --> src/main.rs:30:32
    |
30  |     let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None);
    |                                ^^^^^^^^^^^^^^^ function or associated item not found in `Tokenizer`
ArthurZucker commented 2 months ago

Hey! You have to enbale the https features for that 🤗 Should we clarify the doc about this?

ahenkes1 commented 2 months ago

Hey! That makes sense! A small hint wouldn't hurt ;)