Open uniqueness-ae opened 1 month ago
Would love to see this as well. I can help out in making the feature but I need some pointers as to how it would be possible.
I tried pyannote.audio model using rented cloud GPUs and had some success. Perhaps if there is a way to run this mlx, it will probably run faster. Maybe even better if it’s coupled with whisper to simplify the process. There is a repo from m-bain called WhisperX that does this. Could help as a reference.
Implement speaker diarization for the existing mlx whisper support to:
This addition will provide more insightful and structured transcripts, making it easier to analyze and understand complex audio content. Thanks