pmichel31415 / are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"
MIT License
163 stars 14 forks source link

RuntimeError: can't retain_grad on Tensor that has requires_grad=False #7

Open YJiangcm opened 3 years ago

YJiangcm commented 3 years ago

Sorry to bother you. I met a bug druing runing the "heads_pruning.sh", and the error is:

12:21:27-INFO: Running evaluation 12:21:27-INFO: Num examples = 9815 12:21:27-INFO: Batch size = 32 Evaluating: 0% 0/307 [00:00<?, ?it/s]Traceback (most recent call last): File "pytorch-pretrained-BERT/examples/run_classifier.py", line 585, in main() File "pytorch-pretrained-BERT/examples/run_classifier.py", line 521, in main scorer=processor.scorer, File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate input_ids, segment_ids, input_mask, label_ids) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward output_all_encoded_layers=False, return_att=return_att) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward output_all_encoded_layers=output_all_encoded_layers) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward hidden_states, attn = layer_module(hidden_states, attention_mask) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward attention_output, attn = self.attention(hidden_states, attention_mask) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward self_output, attn = self.self(input_tensor, attention_mask) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 307, in forward self.context_layer_val.retain_grad() File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 326, in retain_grad raise RuntimeError("can't retain_grad on Tensor that has requires_grad=False") RuntimeError: can't retain_grad on Tensor that has requires_grad=False Evaluating: 0% 0/307 [00:00<?, ?it/s]

I don't know how to fix it. Hope you can help me!

HayeonLee commented 2 years ago

Hi. This error occurs due to using both retain_grad() and with torch.no_grad(). Yet, we need grad value only in calculate_head_importance function (grad_ctx = ctx.grad). except the function calculate_head_importance, by deactivating self.context_layer_val.retain_grad(), you can fix it.

xiyiyia commented 1 year ago

Adding if context_layer.requires_grad == True: on 309 lines of modeling.py works for me. Like: if context_layer.requires_grad == True: self.context_layer_val.retain_grad()

Hope it solves your problem.