Open alam-botify opened 3 years ago
The code avaliable on this repository is made only for speaker identification task, speaker verification is out of scope of this project.
The d_vector you are talking about is the vector with the probabilities for each class on the dataset.
The current model is less than 12mb, but the speed will rely on the hardware you will use. To see more details about the model you can check our paper (https://arxiv.org/pdf/2004.00132.pdf)
ok, I got it. It will work for speaker recognition task.
sorry my bad I take it wrong as d_vector.
I wrote a script for computing d_vector based on sincnet compute_d_vector for AM-Mobilenet1D. Here is the link of it: https://drive.google.com/file/d/1mTZYXJ8gjd2ICIjLvd31ovdCNs5qjZRb/view?usp=sharing
can you please check the above script.
Correct me if I am wrong: As I know class_lay[-1] = 462 which is nothing but the number of classes (speakers).
when I change d_vector_dim to any other value than class_lay[-1] I got this error:
File "compute_d_vector_AM-Mobilenet1D.py", line 168, in
So if my approach is correct to compute d_vector how can I change this dimension size or do I need to train model for that number of classes (say 128).
Thanks.
It only makes sense to change the tensor size if you are working with a different dataset with a different number of classes. In this case you would have to change it by setting the new number of classes on the cfg file 'class_lay' parameter.
Hi,
I am trying to find the d_vector for speaker diarization or speaker verification task using the AM-MobileNet1D model.
I have modified my previous inference script to compute the d_vector of test audio chunks.
here is the link for d_vector computation: https://drive.google.com/file/d/1VOot_amZdV7bt2ZZU0puWn9i6dkQKz-1/view?usp=sharing
My questions are:
I am getting a d_vector of size [462] which is nothing but class_lay[-1] so how can I get a d_vector of size 128 or 256 or 512 of whatever dimension we want?
I want to test this model on mobile devices for speaker recognition and speaker diarization, can you suggest how is it feasible in speed and accuracy on a mobile device?
Thanks