Open iMayK opened 2 years ago
I am not sure what you mean, and what your intended use-case is. I do not provide any fine-tuned models, only the code to train a speaker verification system based on various network architectures, such as ECAPA-TDNN, x-vector, and the (novel) wav2vec2.
To answer your question to the best of my knowledge and understanding: Each model has the following method implemented which you can use to retrieve a fixed-length speaker embedding: https://github.com/nikvaessen/w2v2-speaker/blob/ac8db165348328f79bd17162fd63a8cf500b7859/src/lightning_modules/speaker/speaker_recognition_module.py#L110
Is there any direct function that I can call to extract fixed-length embedding for each audio sample using the pre-trained model?