huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.32k stars 26.35k forks source link

run_mlm_wwm.py if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save): TypeError: 'NoneType' object is not iterable #16381

Closed muyuuuu closed 2 years ago

muyuuuu commented 2 years ago

Environment info

Who can help

Models:

Library:

Model hub:

Examples:

Information

Model I am using roberta:

The problem arises when using:

The tasks I am working on is:

this script has been right:

export TRAIN_FILE=/path/to/train/file
export LTP_RESOURCE=/path/to/ltp/tokenizer
export BERT_RESOURCE=/path/to/bert/tokenizer
export SAVE_PATH=/path/to/data/ref.txt

python run_chinese_ref.py \
    --file_name=$TRAIN_FILE \
    --ltp=$LTP_RESOURCE \
    --bert=$BERT_RESOURCE \
    --save_path=$SAVE_PATH

but got error:

export TRAIN_FILE=/path/to/train/file
export VALIDATION_FILE=/path/to/validation/file
export TRAIN_REF_FILE=/path/to/train/chinese_ref/file
export VALIDATION_REF_FILE=/path/to/validation/chinese_ref/file
export OUTPUT_DIR=/tmp/test-mlm-wwm

python run_mlm_wwm.py \
    --model_name_or_path roberta-base \
    --train_file $TRAIN_FILE \
    --validation_file $VALIDATION_FILE \
    --train_ref_file $TRAIN_REF_FILE \
    --validation_ref_file $VALIDATION_REF_FILE \
    --do_train \
    --do_eval \
    --output_dir $OUTPUT_DIR

error info is :

[INFO|trainer.py:1047] 2022-03-24 17:08:46,271 >> Loading model from chinese-roberta-wwm-ext).
Traceback (most recent call last):
  File "/home/20031211375/pretrain/run_language_modeling.py", line 364, in <module>
    main()
  File "/home/20031211375/pretrain/run_language_modeling.py", line 328, in main
    trainer.train(model_path=model_path)
  File "/home/20031211375/.conda/envs/search/lib/python3.9/site-packages/transformers/trainer.py", line 1066, in train
    self._load_state_dict_in_model(state_dict)
  File "/home/20031211375/.conda/envs/search/lib/python3.9/site-packages/transformers/trainer.py", line 1387, in _load_state_dict_in_model
    if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):
TypeError: 'NoneType' object is not iterable

Expected behavior

how to fix it up?

sgugger commented 2 years ago

I don't think this script works for any other model than BERT as it relies on assumptions that the subwords token have the prefix ##.

GWPunk commented 2 years ago

transformers=4.5.0

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.