huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.02k stars 25.59k forks source link

WhisperForAudioClassification cannot evaluate during training using use_weighted_layer_sum #30104

Open chercheurkg opened 3 months ago

chercheurkg commented 3 months ago

System Info

transformers version: 4.40.0.dev0

Who can help?

speech models: @sanchit-gandhi

Information

Tasks

Reproduction

For a classification task, I tried to fine-tune whisper-base pre-trained model using WhisperForAudioClassification and setting use_weighted_layer_sum equal to true. It threw the following error, when it is was evaluating during training.

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

I guess the error occurs when it tries to get prediction, executing the following line of code in my compute metric function:

np.argmax(eval_pred.predictions, axis=1)

  1. Using whisper-base pretrained model and setting use_weighted_layer_sum equal to true

        config = AutoConfig.from_pretrained(
                'openai/whisper-small',
                ..........
            )
    config.use_weighted_layer_sum = True
  2. start training it using a label dataset

Expected behavior

It should not throw the above error as it should work for both use_weighted_layer_sum = True and use_weighted_layer_sum = False. Note, it does not throw this error while executing the exact same code with use_weighted_layer_sum = False.

amyeroberts commented 3 months ago

cc @ylacombe too

ylacombe commented 3 months ago

Hey @chercheurkg, thanks for opening the issue ! Would be great to have a script to reproduce the issue, better yet if it's on a toy dataset! Also, could you copy/past more context from the traceback ?

Thanks!

chercheurkg commented 3 months ago

@ylacombe ,

Thanks for your reply!!

  1. I followed the script of this https://huggingface.co/sanchit-gandhi/whisper-medium-fleurs-lang-id. However, I set the following for using weighted sum layers:

config.use_weighted_layer_sum = True

  1. I did not use any toy dataset, it is a well used dataset for ASR; being successfully used for some other tasks.

  2. I ran it on a integrated cloud environment, which return the following message only:

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

As I mentioned, it encounters this issue while running evaluation process. Here is my compute metric function. def compute_metrics(eval_pred):  predictions = np.argmax(eval_pred.predictions, axis=1)   return metric.compute(predictions=predictions, references=eval_pred.label_ids)     It is very easy to reproduce if you use weighted sum layers by setting     config.use_weighted_layer_sum = True

chercheurkg commented 3 months ago

@ylacombe Is there any update ? I have managed to get more traceback:

alueError Traceback (most recent call last)<ipython-input-23-eb3a2c31f55a> in <cell line: 5>() 87 ) 88 ---> 89 train_result = trainer.train() 90 8 frames[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1622 hf_hub_utils.enable_progress_bars() 1623 else:-> 1624 return inner_training_loop( 1625 args=args, 1626 resume_from_checkpoint=resume_from_checkpoint,[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2027 self.control = self.callback_handler.on_step_end(args, self.state, self.control) 2028 -> 2029 self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2030 else: 2031 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _maybe_log_save_evaluate(self, tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2410 metrics = None 2411 if self.control.should_evaluate:-> 2412 metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) 2413 self._report_to_hp_search(trial, self.state.global_step, metrics) 2414 [/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix) 3227 3228 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop-> 3229 output = eval_loop( 3230 eval_dataloader, 3231 description="Evaluation",[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix) 3518 ) 3519 else:-> 3520 metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels)) 3521 else: 3522 metrics = {}[<ipython-input-22-dc8ddd9dd4ef>](https://localhost:8080/#) in compute_metrics(eval_pred) 2 def compute_metrics(eval_pred): 3 """Computes accuracy on a batch of predictions"""----> 4 predictions = np.argmax(eval_pred.predictions, axis=1) 5 return metric.compute(predictions=predictions, references=eval_pred.label_ids) 6 [/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in argmax(a, axis, out, keepdims) 1227 """ 1228 kwds = {'keepdims': keepdims} if keepdims is not np._NoValue else {}-> 1229 return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds) 1230 1231 [/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapfunc(obj, method, *args, **kwds) 54 bound = getattr(obj, method, None) 55 if bound is None:---> 56 return _wrapit(obj, method, *args, **kwds) 57 58 try:[/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapit(obj, method, *args, **kwds) 43 except AttributeError: 44 wrap = None---> 45 result = getattr(asarray(obj), method)(*args, **kwds) 46 if wrap: 47 if not isinstance(result, mu.ndarray):ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

chercheurkg commented 2 months ago

@amyeroberts , @ylacombe Here are two more observations to add to this issue:

amyeroberts commented 2 months ago

Gentle ping @ylacombe

ylacombe commented 1 month ago

Hey @chercheurkg, thanks for the additional details! Have you checked the shape of the 2D array? Having a reproducing script would be of tremendous help tbh!

chercheurkg commented 1 month ago

@ylacombe , @amyeroberts , Thanks so very much for your reply!

1. len(eval_pred.predictions) is equal to  2 
2. eval_pred.predictions[0] is a tuple of number of classes X evaluation batch
3.  eval_pred.predictions[1] is a tuple of 7 X 1556

I used the exact same script from here: https://huggingface.co/sanchit-gandhi/whisper-medium-fleurs-lang-id. However, I set the following for using weighted sum layers: config.use_weighted_layer_sum = True

Please let me know if you need anything else.

amyeroberts commented 4 weeks ago

Another ping @ylacombe

ylacombe commented 2 days ago

Hey @chercheurkg, sorry for the late response!

I finally had time to take a look into this, here's what I found:

I found this comment that may explain the behaviour above.

This also explains why your GPU memory explodes ( the hidden states stays on the GPU and it quickly occupy the GPU memory). You can avoid this by setting eval_accumulation_steps=1 in your config.json.

A simple solution to your bug is to preprocess the "logits" to get the real logits:

    def preprocess_logits_for_metrics(logits, labels):
        return logits[0]

    # Initialize our trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=raw_datasets["train"] if training_args.do_train else None,
        eval_dataset=raw_datasets["eval"] if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=feature_extractor,
        preprocess_logits_for_metrics=preprocess_logits_for_metrics,
    )

I hope this helps!