[Feature request] speaker-diarization model

HyunjunA commented 9 months ago

Name of the feature In general, the feature you want added should be supported by HuggingFace's transformers library:

If requesting a model, it must be listed here.
If requesting a pipeline, it must be listed here.
- If requesting a task, it must be listed here.

Model: pyannote/speaker-diarization https://huggingface.co/pyannote/speaker-diarization

Reason for request Why is it important that we add this feature? What is your intended use case? Remember, we are more likely to add support for models/pipelines/tasks that are popular (e.g., many downloads), or contain functionality that does not exist (e.g., new input type).

Incorporating a speaker diarization model into web apps will enable us to offer advanced audio analysis features like speaker-change-detection, voice-activity-detection, and overlapped-speech-detection. This will set our app apart in a crowded market.

These features not only add to the functionality but also significantly improve the user engagement by providing a more interactive and insightful experience.

Additional context Add any other context or screenshots about the feature request here.

xenova commented 9 months ago

Do you have any instructions or example code for how to run the model directly in HF transformers? Transformers.js aims to be a JS port of the python library, and may not be suitable for custom use-cases and libraries like this (pyannote).

However, if it can be run in transformers, then it's a good candidate for adding support here too! 🤗

nikhedonia commented 7 months ago

It seems that recently speaker turn detection was added into whisper.cpp; You can find all information in this repository: https://github.com/akashmjn/tinydiarize

The PRs that add support to whisper.cpp appear very small and the new models are 100% compatible as they are finetuned versions that don't use new tokens

xenova / transformers.js

[Feature request] speaker-diarization model #322