nikvaessen / w2v2-speaker

Research code for the paper "Fine-tuning wav2vec2 for speaker recognition" found at https://arxiv.org/abs/2109.15053
MIT License
141 stars 14 forks source link

How can i get fixed length embedding for each audio? #1

Open iMayK opened 2 years ago

iMayK commented 2 years ago

Is there any direct function that I can call to extract fixed-length embedding for each audio sample using the pre-trained model?

nikvaessen commented 2 years ago

I am not sure what you mean, and what your intended use-case is. I do not provide any fine-tuned models, only the code to train a speaker verification system based on various network architectures, such as ECAPA-TDNN, x-vector, and the (novel) wav2vec2.

To answer your question to the best of my knowledge and understanding: Each model has the following method implemented which you can use to retrieve a fixed-length speaker embedding: https://github.com/nikvaessen/w2v2-speaker/blob/ac8db165348328f79bd17162fd63a8cf500b7859/src/lightning_modules/speaker/speaker_recognition_module.py#L110