Open IS5882 opened 2 years ago
Hi, if you are not using the below code then it means that all the parameters of the BERT architecture will get tuned during model training.
for param in bert.parameters():
param.requires_grad = False
In such a case a model like BERT will require more training and more data to be able to give meaningful output. param.requires_grad
is true by default, so you don't have to set it to true.
Coming on to your second question, model=bert
will not work here because the length of the output vector of bert
would be 768. The output vector length has to be 2 because we have 2 classes in our target variable. model = BERT_Arch(bert)
gives an output vector of length 2 and then we apply softmax on this output.
Hello, first of all thanks for the useful repo. I experimented your code on my task, the results were not so good, so I wanted to play around with it. I faced 2 issues:
1] I don't want to freeze BERT parameters, so I commented those 2 lines as you mentioned:
However, when I did that all my sentences were classified 0 (or sometimes all classified as 1), any idea why that happened ?
Alternatively, I set
param.requires_grad = True
, instead ofFalse
, yet I experienced the same behavior, a single label is assigned to all sentences in some runs its 0, other its 1.2] Another thing I tried is to just classify using the original BERT so I set
model=bert
instead of model = BERT_Arch(bert)
, I get the following error while training:The trace stack:
I added return_dict=False in (
bert = AutoModel.from_pretrained('bert-base-uncased',return_dict=False)
), but the error just changed toTypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor,
not tuple with similar stack trace as the one shown above.