iiscleap / NeuralPlda

Implementation of Neural PLDA (NPLDA) model (A discriminative backend for Speaker Verification)
98 stars 27 forks source link

Pre-trained model #4

Closed pollytur closed 3 years ago

pollytur commented 3 years ago

Hello,

Thank you for your code. I would like to try to run in to compare its performance with other approaches. x-vector extractor was trained using 30-dimensional Mel-Frequency Cepstral Coefficients, but I have not managed to find any pre-trained model to x-vector extractor or to NIPlda here. Is it available or you have decided not to share both?

And also if I would like to train on my own in what order the scripts should be run? As far as I get it from now it should be some preprocessing first and then xvector_DPlda_pytorch.py. However, the preprocessing differs a lot conceptually in your code (different number of stages, for instance). What is the reason for it?

prash29 commented 3 years ago

Hi,

Thank you for showing interest in our work. With respect to your question of pre-trained models. We have a folder called "Kaldi_Models" which contains the pre-trained Kaldi x-vector model we use along with the mean vector, transform matrix and trained PLDA model.

I don't quite fully understand your second question, but I hope this answers it. The prerequisites before you run the dataprep code is to have the x-vectors for the dataset that you are using generated and we have stored it in a pickle file as a dictionary (not uploaded due to size constraints). Once you have that and the trials generated to train, you should be good to go. We have a different number of stages for different experiments because of the number of datasets used. For example, the SRE18 experiments use Switchboard, Mixer6, Voxceleb and old SRE data, and each of these datasets have the gender information in metadata files having different formats. Based on these differences, we have a different number of stages to parse these gender information to obtain gender matched trials.

You can also try a modified, quicker sampling technique discussed in our recent Interspeech paper : https://arxiv.org/abs/2008.04527 (Repo link: https://github.com/iiscleap/E2E-NPLDA). In this technique, we sample a much larger number of trials on the fly given a labelled set of utterances, one batch at a time. This is a relatively faster approach.

Hope, this helps!

On Wed, Nov 4, 2020 at 7:15 PM PollyTur notifications@github.com wrote:

Hello,

Thank you for your code. I would like to try to run in to compare its performance with other approaches. x-vector extractor was trained using 30-dimensional Mel-Frequency Cepstral Coefficients, but I have not managed to find any pre-trained model to x-vector extractor or to NIPlda here. Is it available or you have decided not to share both?

And also if I would like to train on my own in what order the scripts should be run? As far as I get it from now it should be some preprocessing first and then xvector_DPlda_pytorch.py. However, the preprocessing differs a lot conceptually in your code (different number of stages, for instance). What is the reason for it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iiscleap/NeuralPlda/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGX3WTHT2O2SWPBJFABPRQ3SOFLJNANCNFSM4TKCI4AA .

pollytur commented 3 years ago

Yes, this really helped. Thank you for the detailed answer. One more question about "Kaldi_Models" - what is "final.raw" there?

iiscleap commented 3 years ago

final.raw is the raw file containing the X-vector (TDNN) network parameters trained using the kaldi toolkit. You can use the kaldi function nnet3-copy to view its contents, or the script https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet3/report/convert_model.py to convert the final.raw to a pickled python dictionary.

pollytur commented 3 years ago

Ok, thank you!