huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.89k stars 26.98k forks source link

Error while trying to run run_wwm_mlm.py using my saved model: TypeError: ‘NoneType’ object is not iterable #13234

Closed jungminc88 closed 3 years ago

jungminc88 commented 3 years ago

Environment info

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

1. I have trained a BertForSequenceClassification model, saved the model and tokenizer:

model.save_pretrained('output_mlm_cls')
tokenizer.save_pretrained('output_mlm_cls')

2. I tried to run run_mlm_wwm.py, giving the the saved model above as the input model:

python run_mlm_wwm.py \ --model_name_or_path /path/to/output_mlm_cls \ --train_file /path/to/my_data.txt \ --do_train \ --output_dir /output_dir

I got this error message:

Traceback (most recent call last): File “run_mlm_wwm.py”, line 408, in main() File “run_mlm_wwm.py”, line 367, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File “/home/cl/jungmin-c/.pyenv/versions/anaconda3-5.1.0/envs/jp/lib/python3.7/site-packages/transformers/trainer.py”, line 1066, in train self._load_state_dict_in_model(state_dict) File “/home/cl/jungmin-c/.pyenv/versions/anaconda3-5.1.0/envs/jp/lib/python3.7/site-packages/transformers/trainer.py”, line 1387, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save): TypeError: ‘NoneType’ object is not iterable

Expected behavior

It should run and train the input model on the whole word masking MLM task. When I run the same thing only changing --model_name_or_path to one of the HuggingFace provided pretrained models (cl-tohoku/bert-base-japanese-whole-word-masking), it runs without a problem, so it's not the problem with the dataset.

qqaatw commented 3 years ago

Hi, since your case is a mlm task, you should probably use BertForMaskedLM instead of BertForSequenceClassification to train your model first, and then feed it into run_wwm_mlm.py script.

jungminc88 commented 3 years ago

@qqaatw Thank you for your suggestion!

Hi, since your case is a mlm task, you should probably use BertForMaskedLM instead of BertForSequenceClassification to train your model first, and then feed it into run_wwm_mlm.py script.

My objective is to see the effect of training BERT on different tasks. I am wondering if training on MLM task after training on classification yields better results. Is there a way to do this using the script?

qqaatw commented 3 years ago

I got your point. You can use BertForPreTraining, which includes two prediction heads (MLM, NSP), to train a sentence classification task first, then feed the trained model into run_wwm_mlm.py to run MLM task. Because BertForPreTraining has two heads already, running mlm afterwards will no longer raise an error regarding mlm head missing.

jungminc88 commented 3 years ago

@qqaatw That's a neat solution! Thank you!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.