nlp-uoregon / trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Apache License 2.0
724 stars 99 forks source link

OSError: Can't load config for 'xlm-roberta-base'. #65

Closed kirianguiller closed 10 months ago

kirianguiller commented 1 year ago

Hello everyone,

I get an error since a few days when running a Pipeline.

I use a fresh install of python 3.8 with trankit 1.1.1 .

Here is the code to reproduce :

# test_trankit.py
from trankit import Pipeline

p = Pipeline(lang='english')

and here is the error I get :

Downloading: 100%|████████████████████████████████████████████████████████████████| 5.07M/5.07M [00:06<00:00, 733kB/s]
http://nlp.uoregon.edu/download/trankit/v1.0.0/xlm-roberta-base/english.zip
Downloading: 100%|██████████████████████████████████████████████████████████████| 47.9M/47.9M [00:03<00:00, 12.2MiB/s]
Loading pretrained XLM-Roberta, this may take a while...
Traceback (most recent call last):
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 234, in get_config_dict
    resolved_config_file = cached_path(
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/file_utils.py", line 267, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file xlm-roberta-base/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_trankit.py", line 23, in <module>
    p = Pipeline(lang='english')
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/pipeline.py", line 82, in __init__
    self._embedding_layers = Multilingual_Embedding(self._config)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/models/base_models.py", line 55, in __init__
    super(Multilingual_Embedding, self).__init__(config, task_name=model_name)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/models/base_models.py", line 13, in __init__
    self.xlmr = XLMRobertaModel.from_pretrained(config.embedding_name,
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/modeling_utils.py", line 578, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 202, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 253, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'xlm-roberta-base'. Make sure that:

- 'xlm-roberta-base' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'xlm-roberta-base' is the correct path to a directory containing a config.json file

I've tried logging stuff in the trankit code (in the cached_path method), but I didn't succeed to debut it. I am suspecting a change in the huggingface pretrained model config (the config.json file being named differently), but I don't know enough context/history to go further in the debugging.

Thanks for your help !

peshmerge commented 1 year ago

Have you found a solution to this problem? Because I'm facing the same problem!

minhhdvn commented 10 months ago

Hi @kirianguiller @peshmerge , Thanks for letting us know. This issue might be due to the confusion of Trankit about the folder containing the cached models. It can be usually solved by deleting all cached model files and download the Trankit models again. Please reopen this issue if you're still facing it.