Any plans to add speaker diarization?

dantheman0207 commented 2 months ago

I realize this isn't included in Whisper out of the box but would love to see this as an additional feature.

Is that something you've at all considered adding?

dantheman0207 commented 2 months ago

If not then word-level timestamps or more discriminate segmentation would both make it a lot easier to plug this into a pipeline and use pyannotate to add diarization on top.

nexuslux commented 1 month ago

Yeah i'm in the same boat. I wanted to us this, but will stick with vanilla mlx whisper for now.

fire17 commented 1 week ago

@mustafaaljadery Also looking for a solution,

I'm using Text2Speech, which can be picked up by Whisper, I want LightningWhisperMLX to have a feature to ignore a specific speaker, or atleast return some speaker uid

this way on the first TTS, if its picked up by whisper, then i'll check if it matches the TTS gen, and if so, will mark this speaker as the ai, and discard its inputs from the rest of the conversation with the user

isn't Person ID included in og whisper already? Thanks and all the best!

mustafaaljadery / lightning-whisper-mlx

Any plans to add speaker diarization? #6