[openai/whisper-tiny][torch.compile] Model compilation: AttributeError: 'DynamicCache' object has no attribute 'key_cache'

daniil-lyakhov commented 3 weeks ago

System Info

transformers version: 4.46.2
Platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.31
Python version: 3.9.5
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.5.1+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@ylacombe, @eustlb

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Hi there! I'm trying to employ torch.compile to speedup the inference of the whisper model, but I cannot overcome the following error:

import torch
import copy
import librosa
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline
from transformers.utils import logging
from urllib.request import urlretrieve

model_id = "openai/whisper-tiny"
processor = AutoProcessor.from_pretrained(model_id)
pt_model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)

pipe_pt = pipeline(
    "automatic-speech-recognition",
    model=pt_model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    device="cpu",
)

en_example_short = "courtroom.wav"
url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/courtroom.wav"
urlretrieve(url, en_example_short)
en_raw_speech, samplerate = librosa.load(str(en_example_short), sr=16000)
sample = copy.deepcopy(en_raw_speech)

pt_result = pipe_pt(sample)
print("*" * 20)
print(f"Result: {pt_result['text']}")

pipe_pt.model.model = torch.compile(pipe_pt.model.model)

pt_result = pipe_pt(sample) # Raises the error
print("*" * 20)
print(f"Result: {pt_result['text']}")

The expected output is something like

Result:  Colonel Jessif, did you order the code rate? You don't have to answer that question. I'll answer the question. You want answers? I think I'm entitled. You want answers? I want the truth. You can't handle the truth.

But an error occures:

``` Traceback (most recent call last): File "/home/dlyakhov/Projects/openvino_notebooks/notebooks/whisper-asr-genai/repro.py", line 39, in pt_result = pipe_pt(sample) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__ return super().__call__(inputs, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1294, in __call__ return next( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__ item = next(self.iterator) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__ processed = self.infer(next(self.iterator), **self.params) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1209, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 515, in _forward tokens = self.model.generate( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/generation_whisper.py", line 555, in generate init_tokens = self._retrieve_init_tokens( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/generation_whisper.py", line 1370, in _retrieve_init_tokens lang_ids = self.detect_language( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/generation_whisper.py", line 1474, in detect_language logits = self(**inputs, decoder_input_ids=decoder_input_ids).logits[:, -1] File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 1767, in forward outputs = self.model( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn return fn(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 1634, in forward decoder_outputs = self.decoder( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 1240, in forward logger.warning_once( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__ return self._torchdynamo_orig_callable( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 1064, in __call__ result = self._inner_convert( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__ return _compile( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 952, in _compile raise InternalTorchDynamoError( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner return _compile_inner(code, one_graph, hooks, transform) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_utils_internal.py", line 87, in wrapper_function return function(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner out_code = transform_code_object(code, transform) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object transformations(instructions, code_options) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 219, in _fn return fn(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 634, in transform tracer.run() File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2796, in run super().run() File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run while self.step(): File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in wrapper return inner_fn(self, inst) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1692, in CALL_FUNCTION_KW self.call_function(fn, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 830, in call_function self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/nn_module.py", line 899, in call_function return variables.UserFunctionVariable(fn, source=source).call_function( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 324, in call_function return super().call_function(tx, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 111, in call_function return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 836, in inline_user_function_return return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 3011, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 3139, in inline_call_ tracer.run() File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run while self.step(): File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in wrapper return inner_fn(self, inst) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1692, in CALL_FUNCTION_KW self.call_function(fn, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 830, in call_function self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/lazy.py", line 156, in realize_and_forward return getattr(self.realize(), name)(*args, **kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/nn_module.py", line 899, in call_function return variables.UserFunctionVariable(fn, source=source).call_function( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 324, in call_function return super().call_function(tx, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 111, in call_function return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 836, in inline_user_function_return return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 3011, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 3139, in inline_call_ tracer.run() File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run while self.step(): File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 491, in inner if truth_fn(mod): File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/cache_utils.py", line 406, in __len__ return len(self.key_cache) File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__ raise AttributeError( torch._dynamo.exc.InternalTorchDynamoError: AttributeError: 'DynamicCache' object has no attribute 'key_cache' from user code: File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 1324, in torch_dynamo_resume_in_forward_at_1240 layer_outputs = decoder_layer( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 732, in forward hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn( File "/home/dlyakhov/Projects/openvino_notebooks/.venv_39/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py", line 520, in forward if is_cross_attention and past_key_value and is_updated: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ```

Expected behavior

Result:  Colonel Jessif, did you order the code rate? You don't have to answer that question. I'll answer the question. You want answers? I think I'm entitled. You want answers? I want the truth. You can't handle the truth.

Could you please help me with that? Thank you!

LysandreJik commented 3 weeks ago

Thanks for your issue @daniil-lyakhov! We'll take a look as soon as a bit of bandwidth frees up cc @eustlb

hasadata commented 3 weeks ago

Any update on this facing same issue

LysandreJik commented 2 weeks ago

Regarding the cache, also cc @gante @zucchini-nlp @ArthurZucker

zucchini-nlp commented 2 weeks ago

@daniil-lyakhov hey, I am not sure if compile is supposed to work out-of-the-box with ASR pipeline for whisper. We recommend to use StaticCache when compiling and compile fullgraph so that the model doesn't recompile every forward call. Please take a look at this tutorial on Whisper + compile

Currently I can overcome this error by some changes in model code, but as I am not very familiar with the pipeline for ASR we'll need more time to see what is the correct fix

daniil-lyakhov commented 1 week ago

@daniil-lyakhov hey, I am not sure if compile is supposed to work out-of-the-box with ASR pipeline for whisper. We recommend to use StaticCache when compiling and compile fullgraph so that the model doesn't recompile every forward call. Please take a look at this tutorial on Whisper + compile

Currently I can overcome this error by some changes in model code, but as I am not very familiar with the pipeline for ASR we'll need more time to see what is the correct fix

@zucchini-nlp, thank you very much for your reply! That's exactly what I need 👍

ylacombe commented 1 week ago

Thanks for your issue @daniil-lyakhov and thanks @zucchini-nlp for pointing out some materials.

Closing for now since it seems solved. Let us know if that works for you.

huggingface / transformers