Closed ireneisdoomed closed 1 year ago
The Trainer
class accepts models coming from 2 different inputs: a pretrained model from transformers
, or a Pytorch module.
I am trying to plug in the classifier class implemented by Olesya to my trainer. However, I am getting an error at the moment: RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x17 and 768x50)
Note that I have tweaked it a little bit, especially the line that defines the last hidden layer to remove CLS
tokens.
This is the code:
config = {
"hidden_layer_size": 768,
"classifier_intermediate_layer_size": 50,
"classifier_layer_size": 17,
}
def tokenize(batch):
"""Tokenises the text and creates a numpy array with its assigned labels."""
encoding = tokenizer(batch["text"], max_length=177, padding="max_length", truncation=True)
labels_batch = {k: batch[k] for k in batch.keys() if k in labels}
labels_matrix = np.zeros((len(batch["text"]), len(labels)))
for idx, label in enumerate(labels):
labels_matrix[:, idx] = labels_batch[label]
encoding["labels"] = labels_matrix.tolist()
return encoding
class BertClassifier(nn.Module):
"""Bert Model for the multi label classification task."""
def __init__(self, config, freeze_bert=False):
"""
@param bert: a BertModel object
@param classifier: a torch.nn.Module classifier
@param freeze_bert (bool): Set `False` to fine_tune the Bert model
"""
super(BertClassifier,self).__init__()
self.bert = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
problem_type="multi_label_classification",
num_labels=len(config["labels"]),
id2label=config["id2label"],
label2id=config["label2id"],
)
self.classifier = nn.Sequential(
nn.Linear(config["hidden_layer_size"], config["classifier_intermediate_layer_size"]),
nn.ReLU(),
nn.Linear(config["classifier_intermediate_layer_size"], config["classifier_layer_size"]))
self.sigmoid = nn.Sigmoid()
# Freeze the Bert Model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = False
def forward(self,input_ids,attention_mask):
"""
Feed input to BERT and the classifier to compute logits.
@param input_ids (torch.Tensor): an input tensor with shape (batch_size,
max_length)
@param attention_mask (torch.Tensor): a tensor that hold attention mask
information with shape (batch_size, max_length)
@return logits (torch.Tensor): an output tensor with shape (batch_size,
num_labels)
"""
outputs = self.bert(input_ids=input_ids,
attention_mask = attention_mask)
# Extract the last hidden state of the token `[CLS]` for classification task
print(outputs.keys())
# Feed input to classifier to compute logits
logit = self.classifier(outputs["logits"])
return logit
class MultilabelTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
"""
Custom loss function calculation using BCEWithLogitsLoss, it returns the loss and the outputs if the
return_outputs flag is set to True
This function is used during training, evaluation, and prediction; specifically every time a batch is processed.
The default loss function is here https://github.com/huggingface/transformers/blob/820c46a707ddd033975bc3b0549eea200e64c7da/src/transformers/trainer.py#L2561
Args:
model: the model we're training
inputs: a dictionary of input tensors
return_outputs: if True, the loss and the model outputs are returned. If False, only the loss is
returned. Defaults to False
Returns:
The loss and the outputs of the model.
"""
# forward pass
outputs = model(**inputs)
logits = outputs.logits
# compute custom loss
loss_fct = torch.nn.BCEWithLogitsLoss()
loss = loss_fct(logits.view(-1, self.model.config.num_labels),
labels.float().view(-1, self.model.config.num_labels))
return (loss, outputs) if return_outputs else loss
I think this is nearly it! I looked at the source code for the Classifier in Transformers, and it was a matter of adding the layers to the self.classifier
attribute.
class CustomBertClassifier(AutoModelForSequenceClassification):
def __init__(self, num_labels, id2label, label2id):
super().__init__()
self.bert = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
problem_type="multi_label_classification",
num_labels=num_labels,
id2label=id2label,
label2id=label2id,
)
self.classifier = nn.Sequential(
nn.Linear(768, 50),
nn.ReLU(),
nn.Linear(50, 17)
)
I was having problems with the above forward
method, because the Tensors had diff shapes.
The head of the model has one layer by default, going from 768 neurons (hidden state) to 17 (number of classes).
This is not exactly what it was initially implemented, having 3 layers as a head consisting of: Linear (768, 50), ReLU, Linear (50, 17)
I need to write a custom classifier class to include this.