Open silvacarl2 opened 11 months ago
like this: https://github.com/openai/whisper/discussions/963
$ whisper --help optional arguments: --initial_prompt INITIAL_PROMPT optional text to provide as a prompt for the first window. (default: None)
$ whisper-ctranslate2 --help optional arguments: --initial_prompt INITIAL_PROMPT optional text to provide as a prompt for the first window. (default: None)
Yes, currently for batch size 1:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
input_speech = dataset[3]["audio"]["array"]
processor = WhisperProcessor.from_pretrained("distil-whisper/distil-large-v2")
model = WhisperForConditionalGeneration.from_pretrained("distil-whisper/distil-large-v2")
input_features = processor(input_speech, return_tensors="pt").input_features
# --- Without prompt ---
output_without_prompt = model.generate(input_features)
print(processor.decode(output_without_prompt[0]))
# <|startoftranscript|><|en|><|transcribe|><|notimestamps|> He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.<|endoftext|>
# --- With prompt ---
# Let's change the spelling of "Leighton" -> "Layton" by passing it as a prompt
prompt_ids = processor.get_prompt_ids("Layton")
output_with_prompt = model.generate(input_features, prompt_ids=prompt_ids)
print(processor.decode(output_with_prompt[0]))
# <|startofprev|> Layton<|startoftranscript|><|en|><|transcribe|><|notimestamps|> He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.<|endoftext|>
I'll generalise this for batch size N
upstream in Transformers!
THIS IS AWESOME!!!!!!!!!!!!!!!!!!! YOU ROCK!!!!!!!!!!!!!!!!!!!!
SO PERFECT!!!!!!!!!!!!!!!!!!!
Sorry about the delay 😅 Hoping we have this fixed for bs=N
very shortly!
take your time, this is so cool we will start testing with it now.
Question: for larger audio files, do we need to split it up into 30 second chunks?
Does it have initial_prompt support?
we use this a lot.