WhisperForAudioClassification cannot evaluate during training using use_weighted_layer_sum

chercheurkg commented 3 months ago

System Info

transformers version: 4.40.0.dev0

Platform: Linux-6.1.58+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.2
Accelerate version: 0.30.0.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
Tensorflow version (GPU?): 2.15.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.2 (cpu)
Jax version: 0.4.23
JaxLib version: 0.4.23
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

speech models: @sanchit-gandhi

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

For a classification task, I tried to fine-tune whisper-base pre-trained model using WhisperForAudioClassification and setting use_weighted_layer_sum equal to true. It threw the following error, when it is was evaluating during training.

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

I guess the error occurs when it tries to get prediction, executing the following line of code in my compute metric function:

np.argmax(eval_pred.predictions, axis=1)

Using whisper-base pretrained model and setting use_weighted_layer_sum equal to true

    config = AutoConfig.from_pretrained(
            'openai/whisper-small',
            ..........
        )

config.use_weighted_layer_sum = True

start training it using a label dataset

Expected behavior

It should not throw the above error as it should work for both use_weighted_layer_sum = True and use_weighted_layer_sum = False. Note, it does not throw this error while executing the exact same code with use_weighted_layer_sum = False.

amyeroberts commented 3 months ago

cc @ylacombe too

ylacombe commented 3 months ago

Hey @chercheurkg, thanks for opening the issue ! Would be great to have a script to reproduce the issue, better yet if it's on a toy dataset! Also, could you copy/past more context from the traceback ?

Thanks!

chercheurkg commented 3 months ago

@ylacombe ,

Thanks for your reply!!

I followed the script of this https://huggingface.co/sanchit-gandhi/whisper-medium-fleurs-lang-id. However, I set the following for using weighted sum layers:

config.use_weighted_layer_sum = True

I did not use any toy dataset, it is a well used dataset for ASR; being successfully used for some other tasks.
I ran it on a integrated cloud environment, which return the following message only:

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

As I mentioned, it encounters this issue while running evaluation process. Here is my compute metric function. def compute_metrics(eval_pred): predictions = np.argmax(eval_pred.predictions, axis=1) return metric.compute(predictions=predictions, references=eval_pred.label_ids) It is very easy to reproduce if you use weighted sum layers by setting config.use_weighted_layer_sum = True

chercheurkg commented 3 months ago

@ylacombe Is there any update ? I have managed to get more traceback:

alueError Traceback (most recent call last)<ipython-input-23-eb3a2c31f55a> in <cell line: 5>() 87 ) 88 ---> 89 train_result = trainer.train() 90 8 frames[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1622 hf_hub_utils.enable_progress_bars() 1623 else:-> 1624 return inner_training_loop( 1625 args=args, 1626 resume_from_checkpoint=resume_from_checkpoint,[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2027 self.control = self.callback_handler.on_step_end(args, self.state, self.control) 2028 -> 2029 self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2030 else: 2031 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _maybe_log_save_evaluate(self, tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2410 metrics = None 2411 if self.control.should_evaluate:-> 2412 metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) 2413 self._report_to_hp_search(trial, self.state.global_step, metrics) 2414 [/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix) 3227 3228 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop-> 3229 output = eval_loop( 3230 eval_dataloader, 3231 description="Evaluation",[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix) 3518 ) 3519 else:-> 3520 metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels)) 3521 else: 3522 metrics = {}[<ipython-input-22-dc8ddd9dd4ef>](https://localhost:8080/#) in compute_metrics(eval_pred) 2 def compute_metrics(eval_pred): 3 """Computes accuracy on a batch of predictions"""----> 4 predictions = np.argmax(eval_pred.predictions, axis=1) 5 return metric.compute(predictions=predictions, references=eval_pred.label_ids) 6 [/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in argmax(a, axis, out, keepdims) 1227 """ 1228 kwds = {'keepdims': keepdims} if keepdims is not np._NoValue else {}-> 1229 return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds) 1230 1231 [/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapfunc(obj, method, *args, **kwds) 54 bound = getattr(obj, method, None) 55 if bound is None:---> 56 return _wrapit(obj, method, *args, **kwds) 57 58 try:[/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapit(obj, method, *args, **kwds) 43 except AttributeError: 44 wrap = None---> 45 result = getattr(asarray(obj), method)(*args, **kwds) 46 if wrap: 47 if not isinstance(result, mu.ndarray):ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

chercheurkg commented 2 months ago

@amyeroberts , @ylacombe Here are two more observations to add to this issue:

When config.use_weighted_layer_sum = True, eval_pred.predictions in compute metric method returns two dimensional tuple/array instead of one dimensional at evaluation step during training.
Also, it takes more than 40 GB or GPU RAM at evaluation step during training with Whisper base model when config.use_weighted_layer_sum = True

amyeroberts commented 2 months ago

Gentle ping @ylacombe

ylacombe commented 1 month ago

Hey @chercheurkg, thanks for the additional details! Have you checked the shape of the 2D array? Having a reproducing script would be of tremendous help tbh!

chercheurkg commented 1 month ago

@ylacombe , @amyeroberts , Thanks so very much for your reply!

1. len(eval_pred.predictions) is equal to  2 
2. eval_pred.predictions[0] is a tuple of number of classes X evaluation batch
3.  eval_pred.predictions[1] is a tuple of 7 X 1556

I used the exact same script from here: https://huggingface.co/sanchit-gandhi/whisper-medium-fleurs-lang-id. However, I set the following for using weighted sum layers: config.use_weighted_layer_sum = True

Please let me know if you need anything else.

amyeroberts commented 4 weeks ago

Another ping @ylacombe

ylacombe commented 2 days ago

Hey @chercheurkg, sorry for the late response!

I finally had time to take a look into this, here's what I found:

when use_weighted_layer_sum=True, ouput_hidden_states is set to True, cf here
Under the hood, the HF Trainer convert the dictionnary to a tuple
The tuple is kept up until eval_pred.predictions
Numpy argmax tries to compute argmax over a tuple, which doesn't work

I found this comment that may explain the behaviour above.

This also explains why your GPU memory explodes ( the hidden states stays on the GPU and it quickly occupy the GPU memory). You can avoid this by setting eval_accumulation_steps=1 in your config.json.

A simple solution to your bug is to preprocess the "logits" to get the real logits:

    def preprocess_logits_for_metrics(logits, labels):
        return logits[0]

    # Initialize our trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=raw_datasets["train"] if training_args.do_train else None,
        eval_dataset=raw_datasets["eval"] if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=feature_extractor,
        preprocess_logits_for_metrics=preprocess_logits_for_metrics,
    )

I hope this helps!

huggingface / transformers