Open sam1am opened 1 year ago
This would be amazing!
I tried using WhisperX for transcribing a podcast with two speakers but the diarization was really bad. I also thought: Hey, since I and I want to use WhisperX mainly to transcribe the same podcast with the same speakers, it would be great to be able to "teach" WhisperX on these specific speakers and how to identify them instead of having to do it from scratch for every single transcription.
Pyannote-audio 3.0x with the embeddings functionality should be able to do what you want. Make a voiceprint database, take the embedding, and find the best "cosine" match.
Short of waiting for WhisperX 4.0 is anyone taking a stab at passing on the embeddings in WhisperX so that we can start the testing in version 3?
Given previously recorded and recognized speaker embeddings used for diarization, it seems like it would be possible to match any new voice to a previously recorded database of known voices with associated speakers/users/names/ids. Is there a way to do this currently?