speaker diarization - Githubissues

tomchang25 / whisper-auto-transcribe

Auto transcribe tool based on whisper

MIT License

205 stars 14 forks source link

speaker diarization #6

Open tomchang25 opened 1 year ago

tomchang25 commented 1 year ago

Have test pyannote-audio as speaker diarization. The error rate is about 30% and need lots of extra install step. In other hands, segmentation (and VAD) is working pretty good. I'll temporarily put on hold speaker diarization until beta version complete.

tomchang25 commented 1 year ago

A successful example about whisper + speaker diarization.

https://github.com/MahmoudAshraf97/whisper-diarization

tomchang25 commented 1 year ago

Another example https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py

device = 0 if torch.cuda.is_available() else "cpu" | pipe = pipeline( | task="automatic-speech-recognition", | model=MODEL_NAME, | chunk_length_s=30, | device=device, | ) | os.makedirs('output', exist_ok=True) | pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe") | | embedding_model = PretrainedSpeakerEmbedding( | "speechbrain/spkrec-ecapa-voxceleb", | device=torch.device("cuda" if torch.cuda.is_available() else "cpu"))