huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.02k stars 628 forks source link

task_distillation.py执行报错StopIteration #102

Closed findlazygirl closed 3 years ago

findlazygirl commented 3 years ago

作者您好: 想请教您些问题, 本次试验环境:torch1.7.1 cuda 11 Geforce RTX3090(2块)

我在执行task_distillation.py的第一步: Step 1: use task_distill.py to run the intermediate layer distillation.

${FT_BERT_BASE_DIR}$ contains the fine-tuned BERT-base model.

python task_distill.py --teacher_model ${FT_BERT_BASE_DIR}$ \ 这里我用的是在MRPC数据上finetune后的bert-base-uncased模型 --student_model ${GENERAL_TINYBERT_DIR}$ \ 这里用的是作者提供的General_TinyBERT(4layer-312dim) --data_dir ${TASK_DIR}$ \ --task_name ${TASK_NAME}$ \ --output_dir ${TMP_TINYBERT_DIR}$ \ --max_seq_length 128 \ --train_batch_size 32 \ 这里改为16 --num_train_epochs 10 \ --aug_train \ --do_lower_case
其他参数设置不变,但在运行的时候出现错误:StopIteration。想请问是什么原因呢?谢谢您。 详细错误提示信息: ....... 01/18 08:25:25 PM label: 1 01/18 08:25:25 PM label_id: 1 01/18 08:25:26 PM Model config { "_name_or_path": "../TinyBERT/model/bert-base-uncased/", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "finetuning_task": "mrpc", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "pre_trained": "", "training": "", "transformers_version": "4.2.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }

01/18 08:25:29 PM Loading model ./model/finetune-model/finetune-MRPC/pytorch_model.bin 01/18 08:25:29 PM loading model... 01/18 08:25:29 PM done! 01/18 08:25:29 PM Weights of TinyBertForSequenceClassification not initialized from pretrained model: ['fit_dense.weight', 'fit_dense.bias'] 01/18 08:25:29 PM Weights from pretrained model not used in TinyBertForSequenceClassification: ['bert.embeddings.position_ids'] 01/18 08:25:34 PM Model config { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "cell": {}, "emb_size": 312, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 312, "initializer_range": 0.02, "intermediate_size": 1200, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 4, "pre_trained": "", "structure": [], "training": "", "type_vocab_size": 2, "vocab_size": 30522 }

01/18 08:25:35 PM Loading model ./General_TinyBert/General_Tinybert_4L_312D/pytorch_model.bin 01/18 08:25:35 PM loading model... 01/18 08:25:35 PM done! 01/18 08:25:35 PM Weights of TinyBertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias'] 01/18 08:25:35 PM Weights from pretrained model not used in TinyBertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 01/18 08:25:35 PM Running training 01/18 08:25:35 PM Num examples = 225519 01/18 08:25:35 PM Batch size = 16 01/18 08:25:35 PM Num steps = 140940 01/18 08:25:35 PM n: module.bert.embeddings.word_embeddings.weight 01/18 08:25:35 PM n: module.bert.embeddings.position_embeddings.weight 01/18 08:25:35 PM n: module.bert.embeddings.token_type_embeddings.weight 01/18 08:25:35 PM n: module.bert.embeddings.LayerNorm.weight 01/18 08:25:35 PM n: module.bert.embeddings.LayerNorm.bias ........ 01/18 08:25:35 PM n: module.fit_dense.weight 01/18 08:25:35 PM n: module.fit_dense.bias 01/18 08:25:35 PM Total parameters: 14591258 Iteration: 0%| | 0/14095 [00:04<?, ?it/s] Epoch: 0%| | 0/10 [00:04<?, ?it/s] Traceback (most recent call last): File "task_distill.py", line 1092, in main() File "task_distill.py", line 936, in main is_student=True) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, *kwargs) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/amax/zhuqq/code/TinyBERT/transformer/modeling.py", line 1133, in forward output_all_encoded_layers=True, output_att=True) File "/home/amax/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/amax/zhuqq/code/TinyBERT/transformer/modeling.py", line 829, in forward dtype=next(self.parameters()).dtype) # fp16 compatibility StopIteration

OutstanderWang commented 2 years ago

@findlazygirl 您好,我也遇到了这个问题,请问您怎么解决的呢?

hes666 commented 1 year ago

这个问题我也遇见了,好像是torch版本不匹配