In a nutshell I am using BertForSequenceClassification (PyTorch) with dccuchile/bert-base-spanish-wwm-cased for solving a binary classification problem. I have trained the network and evaluate the model with a testing dataset (different from the training dataset). I have achieved an acc and val_acc between 0.85 and 0.9. However, after I save the model and retrieve it again in another script, the accuracy is similar to a random classifier (0.41).
The problem arises when using:
[ ] the official example scripts: (give details below)
[X] my own modified scripts: (give details below)
The tasks I am working on is:
[ ] an official GLUE/SQUaD task: (give the name)
[X] my own task or dataset: (give details below)
To reproduce
This is the code I am using for training and evaluating (during training):
criterion = torch.nn.CrossEntropyLoss ()
criterion = criterion.to (device)
optimizer = AdamW (model.parameters(), lr=5e-5)
for epoch in range (4):
i = 0
# Train this epoch
model.train ()
for batch in train_loader:
optimizer.zero_grad ()
input_ids = batch['input_ids'].to (device)
attention_mask = batch['attention_mask'].to (device)
labels = batch['label'].to (device)
loss, _ = model (input_ids, attention_mask=attention_mask, labels=labels)
_, preds = torch.max (_, dim=1)
correct_predictions += torch.sum (preds == labels)
i += 1
acc = correct_predictions.item () / (batch_size * i)
loss.backward ()
optimizer.step ()
# Eval this epoch with the testing dataset
model = model.eval ()
correct_predictions = 0
with torch.no_grad ():
for batch in test_loader:
input_ids = batch['input_ids'].to (device)
attention_mask = batch['attention_mask'].to (device)
labels = batch['label'].to (device)
loss, _ = model (input_ids, attention_mask=attention_mask, labels=labels)
_, preds = torch.max (_, dim=1)
correct_predictions += torch.sum (preds == labels)
model.bert.save_pretrained ("my-model")
tokenizer.save_pretrained ("my-model")
After this step, I got good accuracy after the first epoch
Then, I load the model again in another script
model = BertForSequenceClassification.from_pretrained ("my-model")
# Eval this epoch with the testing dataset
model = model.eval ()
correct_predictions = 0
with torch.no_grad ():
for batch in test_loader:
input_ids = batch['input_ids'].to (device)
attention_mask = batch['attention_mask'].to (device)
labels = batch['label'].to (device)
loss, _ = model (input_ids, attention_mask=attention_mask, labels=labels)
_, preds = torch.max (_, dim=1)
correct_predictions += torch.sum (preds == labels)
print (correct_predictions.item () / len (test_df))
but the accuracy is similar as If I retrieved a non-trained model.
Expected behavior
After load a model saved with save_pretrained, the model should provide similar accuracy and loss for the same data.
Environment info
transformers
version: 3.4.0Who can help
@LysandreJik
Information
Posted in StackOverflow. Received a comment with two similar issues regarding save and load custom models. The original question can be found at: https://stackoverflow.com/questions/64666510/huggingface-transformers-low-accuracy-after-load-custom-pretrained-model-in-a-t?noredirect=1#comment114344159_64666510
In a nutshell I am using BertForSequenceClassification (PyTorch) with
dccuchile/bert-base-spanish-wwm-cased
for solving a binary classification problem. I have trained the network and evaluate the model with a testing dataset (different from the training dataset). I have achieved anacc
andval_acc
between 0.85 and 0.9. However, after I save the model and retrieve it again in another script, the accuracy is similar to a random classifier (0.41).The problem arises when using:
The tasks I am working on is:
To reproduce
This is the code I am using for training and evaluating (during training):
After this step, I got good accuracy after the first epoch
Then, I load the model again in another script
but the accuracy is similar as If I retrieved a non-trained model.
Expected behavior
After load a model saved with
save_pretrained
, the model should provide similar accuracy and loss for the same data.