Error while trying to run run_wwm_mlm.py using my saved model: TypeError: ‘NoneType’ object is not iterable

jungminc88 commented 3 years ago

Environment info

transformers version: 4.7.0
Platform: Linux-4.18.0-25-generic-x86_64-with-debian-buster-sid
Python version: 3.7.9
PyTorch version (GPU?): 1.8.1+cu111 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help

research_projects/bert-loses-patience: @JetRunner
research_projects/distillation: @VictorSanh @LysandreJik

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

[x] the official example scripts: (give details below)

The tasks I am working on is:

[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

1. I have trained a BertForSequenceClassification model, saved the model and tokenizer:

model.save_pretrained('output_mlm_cls')
tokenizer.save_pretrained('output_mlm_cls')

2. I tried to run run_mlm_wwm.py, giving the the saved model above as the input model:

python run_mlm_wwm.py \ --model_name_or_path /path/to/output_mlm_cls \ --train_file /path/to/my_data.txt \ --do_train \ --output_dir /output_dir

I got this error message:

Traceback (most recent call last): File “run_mlm_wwm.py”, line 408, in main() File “run_mlm_wwm.py”, line 367, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File “/home/cl/jungmin-c/.pyenv/versions/anaconda3-5.1.0/envs/jp/lib/python3.7/site-packages/transformers/trainer.py”, line 1066, in train self._load_state_dict_in_model(state_dict) File “/home/cl/jungmin-c/.pyenv/versions/anaconda3-5.1.0/envs/jp/lib/python3.7/site-packages/transformers/trainer.py”, line 1387, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save): TypeError: ‘NoneType’ object is not iterable

Expected behavior

It should run and train the input model on the whole word masking MLM task. When I run the same thing only changing --model_name_or_path to one of the HuggingFace provided pretrained models (cl-tohoku/bert-base-japanese-whole-word-masking), it runs without a problem, so it's not the problem with the dataset.

qqaatw commented 3 years ago

Hi, since your case is a mlm task, you should probably use BertForMaskedLM instead of BertForSequenceClassification to train your model first, and then feed it into run_wwm_mlm.py script.

jungminc88 commented 3 years ago

@qqaatw Thank you for your suggestion!

Hi, since your case is a mlm task, you should probably use BertForMaskedLM instead of BertForSequenceClassification to train your model first, and then feed it into run_wwm_mlm.py script.

My objective is to see the effect of training BERT on different tasks. I am wondering if training on MLM task after training on classification yields better results. Is there a way to do this using the script?

qqaatw commented 3 years ago

I got your point. You can use BertForPreTraining, which includes two prediction heads (MLM, NSP), to train a sentence classification task first, then feed the trained model into run_wwm_mlm.py to run MLM task. Because BertForPreTraining has two heads already, running mlm afterwards will no longer raise an error regarding mlm head missing.

jungminc88 commented 3 years ago

@qqaatw That's a neat solution! Thank you!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers