The latency of wespeaker model is to large

juanmc2005 / diart

A python package to build AI-powered real-time audio applications

MIT License

1.02k stars 87 forks source link

Hi @SheenChi, the values I reported were obtained from the output of diart.stream with my hardware: CPU AMD Ryzen 9 and GPU Nvidia RTX 4060 Max-Q.

If you find the model too slow on your hardware you can try using pyannote/embedding, which is the fastest one. If that's still not enough you could try quantizing a model you like or distilling it into a smaller model. Depending on your hardware, I think distillation would be my preferred choice as a first step, but it requires training.

For training I recommend you use pyannote.audio, as it's very reliable for this use case and would give you instant compatibility with diart

juanmc2005 / diart

The latency of wespeaker model is to large #225