Whisper-large-v3 transcript is trimmed

yv0vaa commented 1 month ago

System Info

optimum 1.21.2
Ubuntu 22.04.4 LTS
CUDA 12.3
cuda-toolkit 11.7
onnxruntime 1.18.1

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

import os
from transformers import WhisperForConditionalGeneration, WhisperProcessor, PretrainedConfig
import torch
import torchaudio
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

model_name = 'openai/whisper-large-v3'
model_path = 'whisper-large-v3'

processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_config = PretrainedConfig.from_pretrained(model_name)
sessions = ORTModelForSpeechSeq2Seq.load_model(
    os.path.join(model_path, 'encoder_model.onnx'),
    os.path.join(model_path, 'decoder_model.onnx'),
)
model = ORTModelForSpeechSeq2Seq(
    sessions[0], 
    sessions[1], 
    model_config, 
    model_path, 
    use_cache=False,
).to(device)

audio, sr = torchaudio.load("example.ogg")
audio = torchaudio.functional.resample(audio[0], sr, 16000)
input_features = processor(audio.cpu(), return_tensors="pt", sampling_rate=16000, max_new_tokens=1000).input_features.to(device)
predicted_ids = model.generate(input_features)[0]
transcription = processor.decode(predicted_ids)
print(transcription)

Expected behavior

For some reason a final transcript is incomplete and is trimmed in the middle of the speech. I've tried to change max_tokens and max_new_tokens parameter, but nothing has changed. Also I didn't understand how to pass compute type and batch size as parameters. PretrainedConfig and GenerationConfig don't have such parameters. Could anyone help me?

IlyasMoutawwakil commented 1 month ago

hey @yv0vaa would you have the time to try out the branch in #1971 and see if it fixes your issues ?

yv0vaa commented 1 month ago

Good afternoon @IlyasMoutawwakil, thanks, but unfortunately it didn't help.

IlyasMoutawwakil commented 1 month ago

oh.. I just noticed that you're passing max_new_tokens to the processor and not generate. Is the behavior different than that of transformers ?

yv0vaa commented 1 month ago

Maybe I'm doing something wrong, but nothing changes. Variation of max_new_tokens in both processor.__call__ and model.generate does not affect the behavior of the model

huggingface / optimum