Open tom-010 opened 1 month ago
Hey @tom-010, the way transformers
is designed is to expose a simple, common loss function, but to also return just the base logits in case you don't want that method.
Just don't pass the labels and compute the function outside the model, and you're good :ok_hand:
Thank you for the info, @LysandreJik! 😊 I initially followed that approach but encountered some issues while using the Trainer
.
I ended up subclassing Trainer
and overriding the compute_loss
method, as suggested in the docstring:
def compute_loss(self, model, inputs, return_outputs=False):
"""
How the loss is computed by Trainer. By default, all models return the loss in the first element.
Subclass and override for custom behavior.
"""
Here’s an example of how I implemented it:
class CustomLossTrainer(Trainer):
def __init__(self, *args, loss_fct, **kwargs):
super().__init__(*args, **kwargs)
# Store the custom loss function (e.g., CrossEntropyLoss with class weights)
self.loss_fct = loss_fct
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.pop("labels")
outputs = model(**inputs)
logits = outputs.logits
# Manually compute loss using the provided custom loss function
loss = self.loss_fct(logits.view(-1, model.config.num_labels), labels.view(-1))
return (loss, outputs) if return_outputs else loss
However, I ran into the issue that this approach involves copying a significant portion of the code from Trainer.compute_loss
. While this works, it’s not ideal because I would miss out on any future updates or changes made to the logic in Trainer
.
It would be great if there were a cleaner way to just inject the custom loss function without needing to replicate existing code. IMO model.loss_fct = CrossEntropyLoss(weight=class_weights)
would be a nice way. Or am I missing something?
If not: Patching and sub-classing works for me right now so I am fine, but I could contribute if this change is welcomed. If not, feeld free to close the issue :+1:
Note, that the same issue would be in other models for token classifications as well, e.g. DebertaV2ForTokenClassification
.
Pinging @muellerzr for when he's back from leave if he wants to chime in; it would require quite a significant change across all models however
Thanks for the feature request though! I understand it would make things much easier in your case
Feature request
In the method
transformers.models.bert.modeling_bert.BertForTokenClassification.forward
, theloss_fct = CrossEntropyLoss()
is currently hard-coded. To change the loss function (e.g., to set class weights inCrossEntropyLoss
), one must currently monkey-patch the model. By makingloss_fct
an attribute (e.g.,self.loss_fct
), users can simply replace it and use custom loss functions during training.Motivation
The motivation behind this proposal stems from the need to change the loss function for fine-tuning a pre-trained BERT model for token classification, particularly when dealing with imbalanced classes. In my use case, I need to prioritize recall, as most tokens belong to the "other" class. To achieve this, I need to set custom weights in the
CrossEntropyLoss
, like this:However, since the loss function is hard-coded inside the
forward
method, modifying it currently requires overriding the entire method just to change one line, as shown here:By turning
loss_fct
into an attribute, we could avoid the need monkey-patching. The change could be as simple as:This would leave existing code unchanged but make it easier to swap in a custom loss function when needed.
Your contribution
I am new to this repository and this would be my first pull request. I would like to ask if these types of changes are welcomed, and if it makes sense to proceed with submitting a pull request for this improvement.