Audio Classification fails to do regression even though the documentation says it should under certain config

nevikw39 commented 11 months ago

System Info

transformers version: 4.35.1
Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.35
Python version: 3.11.5
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.0
Accelerate version: 0.24.1
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: True
- num_processes: 8
- machine_rank: 0
- num_machines: 2
- gpu_ids: all
- main_process_ip: 10.18.18.1
- main_process_port: 8080
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: True
Using distributed or parallel set-up in script?: True

Who can help?

Seems like @sanchit-gandhi would be of help when it comes to Whisper.

In fact, this issue could be fixed easily and I have made it work on our machine by directly modifying the source codes of transformer library. Though I am going to create a pull request, I think I should submit an issue here still.

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Code Sample

The dataset used below is private due to license. So for one who wants to reproduce, he / she might need find a suitable dataset for audio regression.

#!/home/nevikw/miniconda3/envs/ml-project/bin/python

from argparse import ArgumentParser
from random import randint
import warnings

from datasets import load_dataset, Audio, Value
from transformers import (
    AutoFeatureExtractor,
    AutoModelForAudioClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback,
)
import numpy as np
from sklearn.metrics import mean_squared_error

warnings.filterwarnings("ignore")

ap = ArgumentParser()
ap.add_argument("-m", "--base-model", type=str, default="openai/whisper-large-v3")
ap.add_argument("-d", "--sample-duration", type=int, default=30)
ap.add_argument("-b", "--batch-size", type=int, default=4)
ap.add_argument("-g", "--grad-accu-step", type=int, default=8)

args = ap.parse_args()

feature_extractor = AutoFeatureExtractor.from_pretrained(args.base_model)

preprocess = lambda examples: feature_extractor(
    [i["array"][(n := randint(0, len(i["array"]) - (m := min(len(i["array"]), feature_extractor.sampling_rate*args.sample_duration)))) : n + m] for i in examples["audio"]],
    sampling_rate=feature_extractor.sampling_rate,
    do_normalize=True,
)

dataset = (
    load_dataset("nevikw39/ADReSSo")
    .cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
    .cast_column("mmse", Value("float"))
)
dataset["train"], dataset["valid"] = dataset["train"].train_test_split(.25).values()

mean = np.mean(dataset["train"]["mmse"])
std = np.std(dataset["train"]["mmse"])

encoded_dataset = (
    dataset
    .map(preprocess, remove_columns=["audio"], batched=True, load_from_cache_file=False)
    .map(lambda batch: {"label": (np.array(batch["mmse"]) - mean) / std}, remove_columns=["label"], batched=True, load_from_cache_file=False)
)

model = AutoModelForAudioClassification.from_pretrained(args.base_model, num_labels=1)

training_args = TrainingArguments(
    output_dir="models/" + args.base_model[args.base_model.index('/') + 1 :] + "_ADReSSo-MMSE",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=args.batch_size,
    per_device_eval_batch_size=args.batch_size*2,
    gradient_accumulation_steps=args.grad_accu_step,
    num_train_epochs=100,
    warmup_ratio=.05,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="rmse",
    greater_is_better=False,
    push_to_hub_organization="NTHU-ML-2023-team19",
    push_to_hub=False,
    hub_private_repo=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["valid"],
    tokenizer=feature_extractor,
    compute_metrics=lambda eval_pred: {
        "rmse": mean_squared_error(eval_pred.label_ids, eval_pred.predictions, squared=False) * std,
    },
    callbacks=[EarlyStoppingCallback(10)],
)

trainer.train()

print(trainer.evaluate(encoded_dataset["test"]))

trainer.save_model("models/" + args.base_model[args.base_model.index('/') + 1 :] + "_ADReSSo-MMSE")

Error Message

Traceback (most recent call last):
  File "/home/nevikw/ML_Project/./acoustic_ft_mmse.py", line 106, in <module>
    trainer.train()
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/transformers/trainer.py", line 1555, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/transformers/trainer.py", line 2725, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/transformers/trainer.py", line 2748, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 2419, in forward
    loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/modules/loss.py", line 1179, in forward
    return F.cross_entropy(input, target, weight=self.weight,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nevikw/miniconda3/envs/ml-project/lib/python3.11/site-packages/torch/nn/functional.py", line 3053, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Float'

Proposed Solution

I found that the issue could be resolved by assigning appropriate loss function to loss_fct in forward() method of WhisperForAudioClassification class. The pull request will be created latter.

Expected behavior

We should be able to perform the regression task and the mean square error loss should be computed during forward process if config.num_labels=1 as the documentation suggests.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker commented 10 months ago

pinging @ylacombe and @sanchit-gandhi

sanchit-gandhi commented 10 months ago

Great catch @nevikw39 and many thanks for the PR - just left a review: https://github.com/huggingface/transformers/pull/27863#pullrequestreview-1806592151

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers