huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

Unwanted automatic translation of non-english input to diarization. #20

Closed fsteckel closed 1 year ago

fsteckel commented 1 year ago

Currently when using a non-english audio file, speechbox automatically translates the diarization to english. Whisper has this feature, controlled by the task-argument ('transcribe' vs. 'translate'). I was unable to forward this option to the whisper asr-model, as the keyword task is used for 'automatic-speech-recognition'. Whisper by itself is fully capable to transcribe the input audio in german into german language - however not speaker diarized. Is there a way to get around this?

patrickvonplaten commented 1 year ago

cc @sanchit-gandhi maybe?

sanchit-gandhi commented 1 year ago

Hey @fsteckel! Should be possible - if you first update to the latest transformers version:

pip install --upgrade transformers

You should then be able to do:

out = pipeline(sample["audio"], generate_kwargs={"task": "transcribe"})

which will forward the task argument onto Whisper. See AutomaticSpeechRecognitionPipeline.__call__.generate_kwargs and modeling_whisper.py#L1509 for details