HubertForSequenceClassification does not handle regression tasks correctly; always uses CrossEntropyLoss

raopx commented 1 month ago

System Info

transformers version: 4.44.2
Platform: macOS-14.5-arm64-arm-64bit
Python version: 3.12.4
Huggingface_hub version: 0.24.7
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.4.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No

Who can help?

@ylacombe @eustlb

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

from transformers import AutoConfig, HubertForSequenceClassification

model_id = "ntu-spml/distilhubert"

# Configure the model for regression
config = AutoConfig.from_pretrained(model_id)
config.problem_type = "regression"
config.num_labels = 1

# Load the model with the configuration
model = HubertForSequenceClassification.from_pretrained(model_id, config=config)

# Prepare a sample input
import torch
batch = {
    'input_values': torch.randn(1, 16000),  # Example input tensor
    'labels': torch.tensor([120.0])         # Example label (float for regression)
}

outputs = model(input_values=batch['input_values'], labels=batch['labels'])

Analysis:

After investigating the issue, I found that the HubertForSequenceClassification class does not correctly handle the problem_type parameter in its forward method. Specifically, it always uses CrossEntropyLoss, regardless of whether the task is a classification or regression task.

Here is the relevant code from transformers/models/hubert/modeling_hubert.py:

https://github.com/huggingface/transformers/blob/8bd2b1e8c23234cd607ca8d63f53c1edfea27462/src/transformers/models/hubert/modeling_hubert.py#L1633C9-L1637C1

# Inside HubertForSequenceClassification.forward()
if labels is not None:
    loss_fct = CrossEntropyLoss()
    loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1))

Expected behavior

In other model implementations like BertForSequenceClassification, the forward method correctly handles different problem_type settings:

# Inside BertForSequenceClassification.forward()
if labels is not None:
    if self.config.problem_type == "regression":
        loss_fct = MSELoss()
        loss = loss_fct(logits.view(-1), labels.view(-1))
    elif self.config.problem_type == "single_label_classification":
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
    # ...

aroun-coumar commented 1 month ago

Hey @raopx I'll take up this issue, i'll first test and verify in my local and soon create a PR if needed , Thanks

ylacombe commented 1 month ago

Thanks for opening this issue @raopx! @aroun-coumar, let us know how it goes and if you need help!

aroun-coumar commented 1 month ago

Sure @ylacombe , I just started and i'll let you know Thanks

aroun-coumar commented 1 month ago

Please checkout this PR 33551

huggingface / transformers