Add a custom classification layer as a head of the model

ireneisdoomed commented 1 year ago

The head of the model has one layer by default, going from 768 neurons (hidden state) to 17 (number of classes).

This is not exactly what it was initially implemented, having 3 layers as a head consisting of: Linear (768, 50), ReLU, Linear (50, 17)

I need to write a custom classifier class to include this.

ireneisdoomed commented 1 year ago

The Trainer class accepts models coming from 2 different inputs: a pretrained model from transformers, or a Pytorch module.

I am trying to plug in the classifier class implemented by Olesya to my trainer. However, I am getting an error at the moment: RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x17 and 768x50) Note that I have tweaked it a little bit, especially the line that defines the last hidden layer to remove CLS tokens.

This is the code:

config = {
    "hidden_layer_size": 768,
    "classifier_intermediate_layer_size": 50,
    "classifier_layer_size": 17,
}

def tokenize(batch):
    """Tokenises the text and creates a numpy array with its assigned labels."""
    encoding = tokenizer(batch["text"], max_length=177, padding="max_length", truncation=True)

    labels_batch = {k: batch[k] for k in batch.keys() if k in labels}
    labels_matrix = np.zeros((len(batch["text"]), len(labels)))
    for idx, label in enumerate(labels):
        labels_matrix[:, idx] = labels_batch[label]
    encoding["labels"] = labels_matrix.tolist()
    return encoding

class BertClassifier(nn.Module):
    """Bert Model for the multi label classification task."""
    def __init__(self, config, freeze_bert=False):
        """
        @param   bert: a BertModel object
        @param   classifier: a torch.nn.Module classifier
        @param   freeze_bert (bool): Set `False` to fine_tune the Bert model
        """
        super(BertClassifier,self).__init__()

        self.bert = AutoModelForSequenceClassification.from_pretrained(
            "bert-base-uncased",
            problem_type="multi_label_classification",
            num_labels=len(config["labels"]),
            id2label=config["id2label"],
            label2id=config["label2id"],
        )

        self.classifier = nn.Sequential(
                            nn.Linear(config["hidden_layer_size"], config["classifier_intermediate_layer_size"]),
                            nn.ReLU(),
                            nn.Linear(config["classifier_intermediate_layer_size"], config["classifier_layer_size"]))
        self.sigmoid = nn.Sigmoid()
        # Freeze the Bert Model
        if freeze_bert:
            for param in self.bert.parameters():
                param.requires_grad = False

    def forward(self,input_ids,attention_mask):
        """
        Feed input to BERT and the classifier to compute logits.
        @param    input_ids (torch.Tensor): an input tensor with shape (batch_size,
                      max_length)
        @param    attention_mask (torch.Tensor): a tensor that hold attention mask
                      information with shape (batch_size, max_length)
        @return   logits (torch.Tensor): an output tensor with shape (batch_size,
                      num_labels)
        """
        outputs = self.bert(input_ids=input_ids,
                           attention_mask = attention_mask)

        # Extract the last hidden state of the token `[CLS]` for classification task
        print(outputs.keys())
        # Feed input to classifier to compute logits
        logit = self.classifier(outputs["logits"])

        return logit

class MultilabelTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        """
        Custom loss function calculation using BCEWithLogitsLoss, it returns the loss and the outputs if the
        return_outputs flag is set to True
        This function is used during training, evaluation, and prediction; specifically every time a batch is processed.
        The default loss function is here https://github.com/huggingface/transformers/blob/820c46a707ddd033975bc3b0549eea200e64c7da/src/transformers/trainer.py#L2561

        Args:
          model: the model we're training
          inputs: a dictionary of input tensors
          return_outputs: if True, the loss and the model outputs are returned. If False, only the loss is
        returned. Defaults to False

        Returns:
          The loss and the outputs of the model.
        """
        # forward pass
        outputs = model(**inputs)
        logits = outputs.logits
        # compute custom loss
        loss_fct = torch.nn.BCEWithLogitsLoss()
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), 
                        labels.float().view(-1, self.model.config.num_labels))
        return (loss, outputs) if return_outputs else loss

ireneisdoomed commented 1 year ago

I think this is nearly it! I looked at the source code for the Classifier in Transformers, and it was a matter of adding the layers to the self.classifier attribute.

class CustomBertClassifier(AutoModelForSequenceClassification):
    def __init__(self, num_labels, id2label, label2id):
        super().__init__()
        self.bert = AutoModelForSequenceClassification.from_pretrained(
            "bert-base-uncased",
            problem_type="multi_label_classification",
            num_labels=num_labels,
            id2label=id2label,
            label2id=label2id,
        )
        self.classifier = nn.Sequential(
            nn.Linear(768, 50),
            nn.ReLU(),
            nn.Linear(50, 17)
        )

I was having problems with the above forward method, because the Tensors had diff shapes.

ireneisdoomed / stopReasons

Add a custom classification layer as a head of the model #4