import json
from speechbox import ASRDiarizationPipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook
pipe = ASRDiarizationPipeline.from_pretrained(asr_model="openai/whisper-base", diarizer_model="pyannote/speaker-diarization-3.1")
with ProgressHook() as hook:
output = pipe("audio.mp3", hook=hook)
json.dump(output, open("output.json", "w"))
Output
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Question
Where do I specify generate_kwargs = {"language":"Hindi"}?
Code
Output
Question
Where do I specify
generate_kwargs = {"language":"Hindi"}
?