kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.13k stars 5.32k forks source link

New to using Kaldi, just need a model to extract good voice embeddings in a python script from .wav files #4944

Open PhilipAmadasun opened 1 day ago

PhilipAmadasun commented 1 day ago

Does anyone have an example python script that uses one on the x-vector extraction models developed here to extract embeddings? I've gone through some of the repo and have not found any such thing.

I've tried other pre-trained embedding models like that from pyannote embeddings but the extracted vectors were not very accurate representations of speakers when scrutini9zed with cosine similarity (A lot of false positives and negatives).

I'm still testing an embedding model from speech brain but would love to try that developed in kaldi as it was recommended to me.

I would be very grateful for any help in this matter.

csukuangfj commented 1 day ago

Please have a look at next-gen Kaldi.

You can find PYTHON examples at

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speaker-identification.py

and at https://github.com/k2-fsa/sherpa-onnx/tree/master/python-api-examples

(Search for filenames containing the string speaker)

csukuangfj commented 1 day ago

Note: All you need to install sherpa-onnx is run

pip install sherpa-onnx

It supports Linux (arm64, arm32, x64), Windows (x64, x86, arm64), macOS (x64, arm64), etc.