Whisper Scoring Model Saving Errors due to Config+GenerationConfig

gcervantes8 commented 1 month ago

System Info

transformers version: 4.45.1
Platform: Linux-5.10.225-213.878.amzn2.x86_64-x86_64-with-glibc2.31
Python version: 3.11.9
Huggingface_hub version: 0.25.1
Safetensors version: 0.4.3
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes (Should error with or without GPU)
GPU type: Tesla T4

Who can help?

Tagging Speech team: @ylacombe @eustlb and @gante since pull request #32863 seems very relevant, so they might have good insight

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I'm trying to use the latest Transformers version (4.45.1) but I am getting an error when trying to save a WhisperForAudioClassification model.

Here is an example of the code that fails.

whisper_model_name = "openai/whisper-base"

from transformers import WhisperForAudioClassification

import torch
print('Creating Model')

device='cuda'
model = WhisperForAudioClassification.from_pretrained(pretrained_model_name_or_path=whisper_model_name, num_labels=3)
model = model.to(device)

print('Model Created')

# Example inference
temp_batch_size = 4
with torch.no_grad():
    test_audio = torch.zeros([temp_batch_size, 80, 3000], device=device)
    labels = torch.tensor(temp_batch_size * [0], dtype=torch.int64, device=device)
    scores = model.forward(test_audio, labels=labels)
    print('Model Out: ' + str(scores))

model.save_pretrained("models/whisper-scoring-base-v1")

This is the error that I am getting.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[50], line 46
     43     print(model.config.num_labels)
     45 # model.save_pretrained(output_dir, state_dict=state_dict, safe_serialization=self.args.save_safetensors)
---> 46 model.save_pretrained("models/whisper-scoring-base-v1")
     47 # model.save

File /opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py:2628, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, token, save_peft_format, **kwargs)
   2625             setattr(model_to_save.generation_config, param_name, param_value)
   2626             setattr(model_to_save.config, param_name, None)
-> 2628     model_to_save.config.save_pretrained(save_directory)
   2629 if self.can_generate():
   2630     model_to_save.generation_config.save_pretrained(save_directory)

File /opt/conda/lib/python3.11/site-packages/transformers/configuration_utils.py:383, in PretrainedConfig.save_pretrained(self, save_directory, push_to_hub, **kwargs)
    381 non_default_generation_parameters = self._get_non_default_generation_parameters()
    382 if len(non_default_generation_parameters) > 0:
--> 383     raise ValueError(
    384         "Some non-default generation parameters are set in the model config. These should go into either a) "
    385         "`model.generation_config` (as opposed to `model.config`); OR b) a GenerationConfig file "
    386         "(https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) "
    387         f"\nNon-default generation parameters: {str(non_default_generation_parameters)}"
    388     )
    390 os.makedirs(save_directory, exist_ok=True)
    392 if push_to_hub:

ValueError: Some non-default generation parameters are set in the model config. These should go into either a) `model.generation_config` (as opposed to `model.config`); OR b) a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) 
Non-default generation parameters: {'max_length': 448, 'suppress_tokens': [1, 2, 7, 8, 9, 10, 14, 25, 26, 27, 28, 29, 31, 58, 59, 60, 61, 62, 63, 90, 91, 92, 93, 359, 503, 522, 542, 873, 893, 902, 918, 922, 931, 1350, 1853, 1982, 2460, 2627, 3246, 3253, 3268, 3536, 3846, 3961, 4183, 4667, 6585, 6647, 7273, 9061, 9383, 10428, 10929, 11938, 12033, 12331, 12562, 13793, 14157, 14635, 15265, 15618, 16553, 16604, 18362, 18956, 20075, 21675, 22520, 26130, 26161, 26435, 28279, 29464, 31650, 32302, 32470, 36865, 42863, 47425, 49870, 50254, 50258, 50358, 50359, 50360, 50361, 50362], 'begin_suppress_tokens': [220, 50257]}

This is reproducible code that gives me this error. I am trying to actually fine-tune the model and when I use the Trainer to train, every time it tries to save the model, it fails.

The way I understand it, when I create/instantiate the model, it will get all get config and generationconfigs from the Whisper pretrained model. The Whisper Scoring model only uses the Encoder, so all the generationconfigs are useless to it, but I can't figure out how to remove them or never load them.

The other option I was thinking of was to somehow move them to model.generation_config by modifying the WhisperScoringModel class or to move them to a GenerationConfig and save it to a file, but I'm not sure how I would avoid the error with that.

Thank you for the help!

Expected behavior

Saving without an error.

niqodea commented 1 month ago

As a quick fix, add these lines right before saving the model:

del model.config.__dict__["max_length"]
del model.config.__dict__["suppress_tokens"]
del model.config.__dict__["begin_suppress_tokens"]

niqodea commented 1 month ago

Fixing the problem in terms of design is a bit tricky. The main problem here is that WhisperForAudioClassification still takes a more-than-necessary WhisperConfig as input, which contains generation parameters too. Then, when it's time to save the model, the logic that takes care of raising errors in case of non-default generation parameters does not care about whether the model actually needs those parameters or not.

Here are some solutions I thought of:

Have WhisperForAudioClassification take care of patching the config by removing the fields it does not need. Hacky but simple.
Introduce a WhisperEncoderConfig that does not contain the generation fields. Would need to look into how this would relate to the existing WhipserConfig class.

gcervantes8 commented 1 month ago

Thanks the quick fix did fix the issue for me.

I'm not sure which of those 2 solutions is better for a long-term fix, but they both make sense.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ylacombe commented 3 weeks ago

Hey @gcervantes8, thanks for opening this issue. And thanks @niqodea for the quick fix and proposing some solutions.

I'm not sure the two proposed solutions will work here, notably because we don't want to introduce edge cases directly in the code, and because adding another config won't be backward compatible easily and will be too much of a hassle to be worth it.

The issue here happens because a model used for generation was used to initialize a model used for classification. It makes sense to log a warning, but I'm not sure to understand why we raise an error. @ArthurZucker or @Rocketknight1 , do you think we could/should raise a warning instead of an error?

Rocketknight1 commented 3 weeks ago

cc @gante for a generation config question, I think!

huggingface / transformers