huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

Language Selection is Not Available for Whisper Model #40

Open alvynabranches opened 2 months ago

alvynabranches commented 2 months ago

Code

import json
from speechbox import ASRDiarizationPipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook

pipe = ASRDiarizationPipeline.from_pretrained(asr_model="openai/whisper-base", diarizer_model="pyannote/speaker-diarization-3.1")

with ProgressHook() as hook:
    output = pipe("audio.mp3", hook=hook)

json.dump(output, open("output.json", "w"))

Output

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.

Question

Where do I specify generate_kwargs = {"language":"Hindi"}?