huggingface / optimum

šŸš€ Accelerate training and inference of šŸ¤— Transformers and šŸ¤— Diffusers with easy to use hardware optimization tools
Apache License 2.0
2.46k stars 436 forks source link

Whisper-large-v3 transcript is trimmed #1972

Open yv0vaa opened 1 month ago

yv0vaa commented 1 month ago

System Info

optimum 1.21.2
Ubuntu 22.04.4 LTS
CUDA 12.3
cuda-toolkit 11.7
onnxruntime 1.18.1

Who can help?

No response



Reproduction (minimal, reproducible, runnable)

import os
from transformers import WhisperForConditionalGeneration, WhisperProcessor, PretrainedConfig
import torch
import torchaudio
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

model_name = 'openai/whisper-large-v3'
model_path = 'whisper-large-v3'

processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_config = PretrainedConfig.from_pretrained(model_name)
sessions = ORTModelForSpeechSeq2Seq.load_model(
    os.path.join(model_path, 'encoder_model.onnx'),
    os.path.join(model_path, 'decoder_model.onnx'),
model = ORTModelForSpeechSeq2Seq(

audio, sr = torchaudio.load("example.ogg")
audio = torchaudio.functional.resample(audio[0], sr, 16000)
input_features = processor(audio.cpu(), return_tensors="pt", sampling_rate=16000, max_new_tokens=1000)
predicted_ids = model.generate(input_features)[0]
transcription = processor.decode(predicted_ids)

Expected behavior

For some reason a final transcript is incomplete and is trimmed in the middle of the speech. I've tried to change max_tokens and max_new_tokens parameter, but nothing has changed. Also I didn't understand how to pass compute type and batch size as parameters. PretrainedConfig and GenerationConfig don't have such parameters. Could anyone help me?

IlyasMoutawwakil commented 1 month ago

hey @yv0vaa would you have the time to try out the branch in #1971 and see if it fixes your issues ?

yv0vaa commented 1 month ago

Good afternoon @IlyasMoutawwakil, thanks, but unfortunately it didn't help.

IlyasMoutawwakil commented 1 month ago

oh.. I just noticed that you're passing max_new_tokens to the processor and not generate. Is the behavior different than that of transformers ?

yv0vaa commented 1 month ago

Maybe I'm doing something wrong, but nothing changes. Variation of max_new_tokens in both processor.__call__ and model.generate does not affect the behavior of the model