Unfreezing Bert Parameters and training only on BERT

Hello, first of all thanks for the useful repo. I experimented your code on my task, the results were not so good, so I wanted to play around with it. I faced 2 issues:

1] I don't want to freeze BERT parameters, so I commented those 2 lines as you mentioned:

for param in bert.parameters():
    param.requires_grad =  False

However, when I did that all my sentences were classified 0 (or sometimes all classified as 1), any idea why that happened ?

Alternatively, I set param.requires_grad = True, instead of False, yet I experienced the same behavior, a single label is assigned to all sentences in some runs its 0, other its 1.

2] Another thing I tried is to just classify using the original BERT so I set model=bert instead of model = BERT_Arch(bert), I get the following error while training:

TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not BaseModelOutputWithPoolingAndCrossAttentions

The trace stack:

 Epoch 1 / 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-789-c5138ddf6b25> in <module>()
     12 
     13     #train model
---> 14     train_loss, _ = train()
     15 
     16     #evaluate model

3 frames
<ipython-input-787-a8875e82e2a3> in train()
     28 
     29     # compute the loss between actual and predicted values
---> 30     loss = cross_entropy(preds, labels)
     31 
     32     # add on to the total loss

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    209 
    210     def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 211         return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
    212 
    213 

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2530     if size_average is not None or reduce is not None:
   2531         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2532     return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2533 
   2534 

TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not BaseModelOutputWithPoolingAndCrossAttentions

I added return_dict=False in (bert = AutoModel.from_pretrained('bert-base-uncased',return_dict=False)), but the error just changed to TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple with similar stack trace as the one shown above.

prateekjoshi565 / Fine-Tuning-BERT

Unfreezing Bert Parameters and training only on BERT #15