huggingface / optimum

šŸš€ Accelerate training and inference of šŸ¤— Transformers and šŸ¤— Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.46k stars 436 forks source link

Whisper-large-v3 transcript is trimmed #1972

Open yv0vaa opened 1 month ago

yv0vaa commented 1 month ago

System Info

optimum 1.21.2
Ubuntu 22.04.4 LTS
CUDA 12.3
cuda-toolkit 11.7
onnxruntime 1.18.1

Who can help?

No response

Information

Tasks

Reproduction (minimal, reproducible, runnable)

import os
from transformers import WhisperForConditionalGeneration, WhisperProcessor, PretrainedConfig
import torch
import torchaudio
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

model_name = 'openai/whisper-large-v3'
model_path = 'whisper-large-v3'

processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_config = PretrainedConfig.from_pretrained(model_name)
sessions = ORTModelForSpeechSeq2Seq.load_model(
    os.path.join(model_path, 'encoder_model.onnx'),
    os.path.join(model_path, 'decoder_model.onnx'),
)
model = ORTModelForSpeechSeq2Seq(
    sessions[0], 
    sessions[1], 
    model_config, 
    model_path, 
    use_cache=False,
).to(device)

audio, sr = torchaudio.load("example.ogg")
audio = torchaudio.functional.resample(audio[0], sr, 16000)
input_features = processor(audio.cpu(), return_tensors="pt", sampling_rate=16000, max_new_tokens=1000).input_features.to(device)
predicted_ids = model.generate(input_features)[0]
transcription = processor.decode(predicted_ids)
print(transcription)

Expected behavior

For some reason a final transcript is incomplete and is trimmed in the middle of the speech. I've tried to change max_tokens and max_new_tokens parameter, but nothing has changed. Also I didn't understand how to pass compute type and batch size as parameters. PretrainedConfig and GenerationConfig don't have such parameters. Could anyone help me?

IlyasMoutawwakil commented 1 month ago

hey @yv0vaa would you have the time to try out the branch in #1971 and see if it fixes your issues ?

yv0vaa commented 1 month ago

Good afternoon @IlyasMoutawwakil, thanks, but unfortunately it didn't help.

IlyasMoutawwakil commented 1 month ago

oh.. I just noticed that you're passing max_new_tokens to the processor and not generate. Is the behavior different than that of transformers ?

yv0vaa commented 1 month ago

Maybe I'm doing something wrong, but nothing changes. Variation of max_new_tokens in both processor.__call__ and model.generate does not affect the behavior of the model