thewh1teagle / pyannote-rs

pyannote audio diarization in rust
http://crates.io/crates/pyannote-rs
MIT License
32 stars 3 forks source link
asr diarization onnxruntime rust speech-recognition whisper

pyannote-rs

Crates License

Pyannote audio diarization in Rust

Features

Install

cargo add pyannote-rs

Usage

See Building

Examples

See examples

How it works pyannote-rs uses 2 models for speaker diarization: 1. **Segmentation**: [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) identifies when speech occurs. 2. **Speaker Identification**: [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM) identifies who is speaking. Inference is powered by [onnxruntime](https://onnxruntime.ai/). - The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks). - The embedding model processes filter banks (audio features) extracted with [knf-rs](https://github.com/thewh1teagle/knf-rs). Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.

Credits

Big thanks to pyannote-onnx and kaldi-native-fbank