huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.97k stars 27k forks source link

Protobuf #10020

Closed chschoenenberger closed 3 years ago

chschoenenberger commented 3 years ago

Environment info

Who can help

@thomwolf @LysandreJik

Models:

Packages:

Information

Model I am using (Bert, XLNet ...): T-Systems-onsite/cross-en-de-roberta-sentence-transformer

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

  1. Create a new empty project with pipenv
  2. Install sentence-transformers
  3. Call SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
Traceback (most recent call last):
  File "C:/Source/pythonProject/main.py", line 4, in <module>
    SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 87, in __init__
    transformer_model = Transformer(model_name_or_path)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\sentence_transformers\models\Transformer.py", line 31, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 385, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\tokenization_utils_base.py", line 1768, in from_pretrained
    return cls._from_pretrained(
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\tokenization_utils_base.py", line 1841, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\models\xlm_roberta\tokenization_xlm_roberta_fast.py", line 133, in __init__
    super().__init__(
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\tokenization_utils_fast.py", line 89, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\convert_slow_tokenizer.py", line 659, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\convert_slow_tokenizer.py", line 301, in __init__
    requires_protobuf(self)
  File "C:\Users\chrs\.virtualenvs\pythonProject-WdXdK-Rq\lib\site-packages\transformers\file_utils.py", line 467, in requires_protobuf
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
ImportError: 
XLMRobertaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment.

Expected behavior

Somehow the protobuf dependency doesn't get installed properly with Pipenv and when I try initializing a SentenceTransformer Object with the T-Systems-onsite/cross-en-de-roberta-sentence-transformer it crashes. It can be resolved by manually installing Protobuf. I saw, that it is in your dependencies. This might be a Pipenv or SentenceTransformer issue as well but I thought I would start with you folks.

The error occured on our Cloud instance as well as on my local windows machine. If you think the issue is related to another package please let me know, then I will contact them 😊

Thanks a lot

LysandreJik commented 3 years ago

Just to make sure, can you try installing sentencepiece? pip install sentencepiece

chschoenenberger commented 3 years ago

Pip says Requirement already satisfied: sentencepiece in c:\users\chrs\.virtualenvs\pythonproject-wdxdk-rq\lib\site-packages (0.1.95) Pipenv "installs it" (I guess it just links it) and writes it to the lock-file. Running the example again I get the same error about Protobuf.

LysandreJik commented 3 years ago

Okay, thank you for trying. Could you show me the steps you did to get this error, seeing as you get the errors on both your cloud instance and your windows machine? I'll try it on my Windows machine and try to reproduce the issue to find out what's happening.

chschoenenberger commented 3 years ago

Yeah the steps are as follows:

  1. Create a new pipenv environment
  2. Install sentence-transformers
  3. Create a python file with the following content from sentence-transformers import SentenceTransformer SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
  4. Run the python file => Error
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

tshrjn commented 3 years ago

Facing the same issue with T5. Following demo code:

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" 
tokenizer = AutoTokenizer.from_pretrained(model_name)
janaruto commented 3 years ago

I had the same problem. I had tried many things like here Link but nothing fixed the problem.

With the same environment I worked with the fastai library, which installs quite a few packages. So I created a new environment without fastai and now it works.

name: [NAME] channels:

redadmiral commented 3 years ago

As mentioned over here, pip install protobuf could help.

raoulg commented 2 years ago

This is still a problem.

On an ubuntu cloud instance, I installed in a venv:

torch
transformers
pandas
seaborn
jupyter
sentencepiece
protobuf==3.20.1

I had to downgrade protobuf to 3.20.x for it to work.

Expected behaviour would be that it works without the need to search the internet to land at this fix.

robhaslinger commented 2 years ago

Thanks @raoulg. I had the same issue working with the pegasus model, actually from an example in huggingface's new book. Downgrading to 3.20.x was the solution.

mirekphd commented 1 year ago

I didn't have to downgrade, just install a missing protobuf (latest version). This can be reproduced in e.g. a Hugging Face example for e.g. DONUT document classifier using our latest CUDA 11.8 containers: mirekphd/cuda-11.8-cudnn8-devel-ubuntu22.04:20230928. Note that the official nvidia/cuda/11.8.0-cudnn8-devel-ubuntu22.04 containers seem to come with protobuf already preinstalled, so you won't reproduce the bug there).

Perhaps protobuf should be added explicitly as a dependency of transformers?

manas95826 commented 10 months ago

I'm still facing the same error. I have fine tuned mistral model, but I'm trying to inference it, it's still giving me:

Could not complete request to HuggingFace API, Status Code: 500, Error: \nLlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the\ninstallation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones\nthat match your environment. Please note that you may need to restart your runtime after installation.\n

I've done: pip install protobuf, in both env (fine tuning and inferencing)