huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.09k stars 27.03k forks source link

UnboundLocalError: local variable 'tokenizer' referenced before assignment #3204

Closed PosoSAgapo closed 4 years ago

PosoSAgapo commented 4 years ago

I am runnnig the example code on the homepage. However,I met this problem.

import torch
from transformers import *

MODELS = [(BertModel,       BertTokenizer,       'bert-base-uncased'),
          (OpenAIGPTModel,  OpenAIGPTTokenizer,  'openai-gpt'),
          (GPT2Model,       GPT2Tokenizer,       'gpt2'),
          (CTRLModel,       CTRLTokenizer,       'ctrl'),
          (TransfoXLModel,  TransfoXLTokenizer,  'transfo-xl-wt103'),
          (XLNetModel,      XLNetTokenizer,      'xlnet-base-cased'),
          (XLMModel,        XLMTokenizer,        'xlm-mlm-enfr-1024'),
          (DistilBertModel, DistilBertTokenizer, 'distilbert-base-cased'),
          (RobertaModel,    RobertaTokenizer,    'roberta-base'),
          (XLMRobertaModel, XLMRobertaTokenizer, 'xlm-roberta-base'),
         ]
for model_class, tokenizer_class, pretrained_weights in MODELS:
    tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
    model = model_class.from_pretrained(pretrained_weights)
    input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])  
    with torch.no_grad():
        last_hidden_states = model(input_ids)[0]
`UnboundLocalError: local variable 'tokenizer' referenced before assignmen

This happened when the model_class goes to the XLMModel.I do not quite understand why this happen,because this problem only occurs when the model is XLMModel.

PosoSAgapo commented 4 years ago

Plus:I have seen a similiar issue in this project,however the problem in that issue is that he did not input the right pretrain_weights.But I do not think that will be the solution in here

PosoSAgapo commented 4 years ago

Similiarly,I aslo tried DistilBert,Roberta,XLMRoberta,these 3 models also cannot work for me,the error message is the same as the one I described above.

BramVanroy commented 4 years ago

I just tried this and cannot reproduce the behaviour that you indicate. Are you running this from a notebook? Try restarting your kernel and running it again.

PosoSAgapo commented 4 years ago

I just tried this and cannot reproduce the behaviour that you indicate. Are you running this from a notebook? Try restarting your kernel and running it again.

I run this programme on the linux GPU server,I tried restarting the python programme,however,the problem is still exsiting.Would this be the problem of downloading the model?

BramVanroy commented 4 years ago

No. UnboundLocalError simply means that Python hasn't seen this variable before, which cannot occur in your code snippet. If the models were downloaded incorrectly, you'd get another error. Even if the tokenizer was initialized as None you'd get another error.

Are you sure that is your only code that is running? Please pos the full trace.

PosoSAgapo commented 4 years ago

No. UnboundLocalError simply means that Python hasn't seen this variable before, which cannot occur in your code snippet. If the models were downloaded incorrectly, you'd get another error. Even if the tokenizer was initialized as None you'd get another error.

Are you sure that is your only code that is running? Please pos the full trace.

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/users4/bwchen/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 302, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/users4/bwchen/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 438, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/users4/bwchen/anaconda3/lib/python3.7/site-packages/transformers/tokenization_bert.py", line 164, in __init__
    "model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`".format(vocab_file))
ValueError: Can't find a vocabulary file at path '/users4/bwchen/.cache/torch/transformers/37cc1eaaea18a456726fc28ecb438852f0ca1d9e7d259e6e3747ee33065936f6'. To load the vocabulary from a Google pretrained model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`

I am sure that is the only code I was running at that time , I am tring to reproduce this error.This time it is working properly when the model_class goes the aforementioned 'wrong' model XLMModel. However,when the model continues to run,I met another problem when the model was the DistillBert, does this error means that I have to use BertTokenizer instead of DistillBertTokenizer?

nbroad1881 commented 4 years ago

I can also attest to this error.

I am using a Kaggle notebook, and I get this error after running this in my first cell. Most of it is default code, bottom two lines are the key ones.

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.
print(os.getcwd(), os.listdir())

from transformers import RobertaTokenizer
tknzr = RobertaTokenizer.from_pretrained('roberta-large')

Error thrown

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-1-7957db35f110> in <module>
     19 from transformers import RobertaTokenizer
     20 
---> 21 tknzr = RobertaTokenizer.from_pretrained('roberta-large')

/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils.py in from_pretrained(cls, *inputs, **kwargs)
    300 
    301         """
--> 302         return cls._from_pretrained(*inputs, **kwargs)
    303 
    304 

/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
    442 
    443         # Save inputs and kwargs for saving and re-loading with ``save_pretrained``
--> 444         tokenizer.init_inputs = init_inputs
    445         tokenizer.init_kwargs = init_kwargs
    446 

UnboundLocalError: local variable 'tokenizer' referenced before assignment

Kaggle runs transformers version 2.3.0 by default. After updating to 2.5.1 it worked just fine. To update on Kaggle, turn the internet option on in the settings in the right side. Then do !pip install -U transformers

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.