prateekjoshi565 / Fine-Tuning-BERT

Apache License 2.0
139 stars 119 forks source link

Unfreezing Bert Parameters and training only on BERT #15

Open IS5882 opened 2 years ago

IS5882 commented 2 years ago

Hello, first of all thanks for the useful repo. I experimented your code on my task, the results were not so good, so I wanted to play around with it. I faced 2 issues:

1] I don't want to freeze BERT parameters, so I commented those 2 lines as you mentioned:

for param in bert.parameters():
    param.requires_grad =  False

However, when I did that all my sentences were classified 0 (or sometimes all classified as 1), any idea why that happened ?

Alternatively, I set param.requires_grad = True, instead of False, yet I experienced the same behavior, a single label is assigned to all sentences in some runs its 0, other its 1.

2] Another thing I tried is to just classify using the original BERT so I set model=bert instead of model = BERT_Arch(bert), I get the following error while training:

TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not BaseModelOutputWithPoolingAndCrossAttentions

The trace stack:

 Epoch 1 / 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-789-c5138ddf6b25> in <module>()
     12 
     13     #train model
---> 14     train_loss, _ = train()
     15 
     16     #evaluate model

3 frames
<ipython-input-787-a8875e82e2a3> in train()
     28 
     29     # compute the loss between actual and predicted values
---> 30     loss = cross_entropy(preds, labels)
     31 
     32     # add on to the total loss

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    209 
    210     def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 211         return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
    212 
    213 

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2530     if size_average is not None or reduce is not None:
   2531         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2532     return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2533 
   2534 

TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not BaseModelOutputWithPoolingAndCrossAttentions

I added return_dict=False in (bert = AutoModel.from_pretrained('bert-base-uncased',return_dict=False)), but the error just changed to TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple with similar stack trace as the one shown above.

prateekjoshi565 commented 2 years ago

Hi, if you are not using the below code then it means that all the parameters of the BERT architecture will get tuned during model training.

for param in bert.parameters():
    param.requires_grad =  False

In such a case a model like BERT will require more training and more data to be able to give meaningful output. param.requires_grad is true by default, so you don't have to set it to true.

Coming on to your second question, model=bert will not work here because the length of the output vector of bert would be 768. The output vector length has to be 2 because we have 2 classes in our target variable. model = BERT_Arch(bert) gives an output vector of length 2 and then we apply softmax on this output.