tomchang25 / whisper-auto-transcribe

Auto transcribe tool based on whisper
MIT License
205 stars 14 forks source link

speaker diarization #6

Open tomchang25 opened 1 year ago

tomchang25 commented 1 year ago

Have test pyannote-audio as speaker diarization. The error rate is about 30% and need lots of extra install step. In other hands, segmentation (and VAD) is working pretty good. I'll temporarily put on hold speaker diarization until beta version complete.

tomchang25 commented 1 year ago

A successful example about whisper + speaker diarization.

https://github.com/MahmoudAshraf97/whisper-diarization

tomchang25 commented 1 year ago

Another example https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py

device = 0 if torch.cuda.is_available() else "cpu"   | pipe = pipeline(   | task="automatic-speech-recognition",   | model=MODEL_NAME,   | chunk_length_s=30,   | device=device,   | )   | os.makedirs('output', exist_ok=True)   | pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")   |     | embedding_model = PretrainedSpeakerEmbedding(   | "speechbrain/spkrec-ecapa-voxceleb",   | device=torch.device("cuda" if torch.cuda.is_available() else "cpu"))