ValueError: Couldn't instantiate the backend tokenizer while loading model tokenizer

rsanjaykamath commented 3 years ago

Environment info

transformers version: 4.2.2
Platform: Colab
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@mfuntowicz @patrickvonplaten

Information

Model I am using (Bert, XLNet ...): T5 The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below) https://github.com/allenai/unifiedqa Loading the model mentioned here for tokenizer does not work The tasks I am working on is:
[x] an official GLUE/SQUaD task: (give the name)
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Follow the instructions here https://github.com/allenai/unifiedqa to get the sample code
Copy paste it in Colab to run it.

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def run_model(input_string, **generator_args):
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, **generator_args)
    return tokenizer.batch_decode(res, skip_special_tokens=True)

Expected behavior

The following code should load the model without errors.

Error

But the following error is obtained:

ValueError                                Traceback (most recent call last)
<ipython-input-4-ee10e1c1c77e> in <module>()
      2 
      3 model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
----> 4 tokenizer = AutoTokenizer.from_pretrained(model_name)
      5 model = T5ForConditionalGeneration.from_pretrained(model_name)
      6 

4 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
     94         else:
     95             raise ValueError(
---> 96                 "Couldn't instantiate the backend tokenizer from one of: "
     97                 "(1) a `tokenizers` library serialization file, "
     98                 "(2) a slow tokenizer instance to convert or "

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

patrickvonplaten commented 3 years ago

Hey @rsanjaykamath,

I cannot reproduce the error on master. When running:

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

I don't encounter any errors...could you try to update transformers to the newest version and try again?

rsanjaykamath commented 3 years ago

Hi @patrickvonplaten ,

That's strange. I just tried it on Colab with the version 4.2.2 of transformers and the same error occurs again. Have you tried it on colab? or local machine?

patrickvonplaten commented 3 years ago

I see it's the classic sentencepiece error - I should have better read your error message ;-)

Here the colab to show how it works: https://colab.research.google.com/drive/1QybYdj-1bW0MHD0cutWBPWas5IFEhSjC?usp=sharing

patrickvonplaten commented 3 years ago

Also see: https://github.com/huggingface/transformers/issues/8963

rsanjaykamath commented 3 years ago

Ok got it. Installing sentencepiece and restarting the kernel did the trick for me.

Thanks for your help :) Closing the issue.

NourEldin-Osama commented 1 year ago

I think the error message should be more clear

trexanhvnn commented 9 months ago

I see it's the classic sentencepiece error - I should have better read your error message ;-)

Here the colab to show how it works: https://colab.research.google.com/drive/1QybYdj-1bW0MHD0cutWBPWas5IFEhSjC?usp=sharing

pb6192 commented 2 months ago

In case it helps someone...I got this error because I had a corrupted or missing file in the Llama3 model. I downloaded it again and it fixed it.

huggingface / transformers