Open dantheman0207 opened 2 months ago
If not then word-level timestamps or more discriminate segmentation would both make it a lot easier to plug this into a pipeline and use pyannotate to add diarization on top.
Yeah i'm in the same boat. I wanted to us this, but will stick with vanilla mlx whisper for now.
@mustafaaljadery Also looking for a solution,
I'm using Text2Speech, which can be picked up by Whisper, I want LightningWhisperMLX to have a feature to ignore a specific speaker, or atleast return some speaker uid
this way on the first TTS, if its picked up by whisper, then i'll check if it matches the TTS gen, and if so, will mark this speaker as the ai, and discard its inputs from the rest of the conversation with the user
isn't Person ID included in og whisper already? Thanks and all the best!
I realize this isn't included in Whisper out of the box but would love to see this as an additional feature.
Is that something you've at all considered adding?