Closed fsteckel closed 1 year ago
cc @sanchit-gandhi maybe?
Hey @fsteckel! Should be possible - if you first update to the latest transformers version:
pip install --upgrade transformers
You should then be able to do:
out = pipeline(sample["audio"], generate_kwargs={"task": "transcribe"})
which will forward the task argument onto Whisper. See AutomaticSpeechRecognitionPipeline.__call__.generate_kwargs
and modeling_whisper.py#L1509 for details
Currently when using a non-english audio file, speechbox automatically translates the diarization to english. Whisper has this feature, controlled by the task-argument ('transcribe' vs. 'translate'). I was unable to forward this option to the whisper asr-model, as the keyword task is used for 'automatic-speech-recognition'. Whisper by itself is fully capable to transcribe the input audio in german into german language - however not speaker diarized. Is there a way to get around this?