joaoantoniocn / AM-MobileNet1D

The Additive Margin MobileNet1D is a new light weight deep learning model for Speaker Recognition which is based on the MobileNetV2 architecture and the Additive Margin Softmax (AM-Softmax) loss function.)
29 stars 10 forks source link

how to find d_vector of custom length using AM-MobileNet1D model. #4

Open alam-botify opened 3 years ago

alam-botify commented 3 years ago

Hi,

I am trying to find the d_vector for speaker diarization or speaker verification task using the AM-MobileNet1D model.

I have modified my previous inference script to compute the d_vector of test audio chunks.

here is the link for d_vector computation: https://drive.google.com/file/d/1VOot_amZdV7bt2ZZU0puWn9i6dkQKz-1/view?usp=sharing

My questions are:

  1. I am getting a d_vector of size [462] which is nothing but class_lay[-1] so how can I get a d_vector of size 128 or 256 or 512 of whatever dimension we want?

  2. I want to test this model on mobile devices for speaker recognition and speaker diarization, can you suggest how is it feasible in speed and accuracy on a mobile device?

Thanks

joaoantoniocn commented 3 years ago

The code avaliable on this repository is made only for speaker identification task, speaker verification is out of scope of this project.

The d_vector you are talking about is the vector with the probabilities for each class on the dataset.

The current model is less than 12mb, but the speed will rely on the hardware you will use. To see more details about the model you can check our paper (https://arxiv.org/pdf/2004.00132.pdf)

alam-botify commented 3 years ago

ok, I got it. It will work for speaker recognition task.

sorry my bad I take it wrong as d_vector.

I wrote a script for computing d_vector based on sincnet compute_d_vector for AM-Mobilenet1D. Here is the link of it: https://drive.google.com/file/d/1mTZYXJ8gjd2ICIjLvd31ovdCNs5qjZRb/view?usp=sharing

can you please check the above script.

Correct me if I am wrong: As I know class_lay[-1] = 462 which is nothing but the number of classes (speakers).

when I change d_vector_dim to any other value than class_lay[-1] I got this error: File "compute_d_vector_AM-Mobilenet1D.py", line 168, in :] = MOBILENET_net(inp) RuntimeError: The expanded size of the tensor (128) must match the existing size (462) at non-singleton dimension 1. Target sizes: [261, 128]. Tensor sizes: [261, 462]

So if my approach is correct to compute d_vector how can I change this dimension size or do I need to train model for that number of classes (say 128).

Thanks.

joaoantoniocn commented 3 years ago

It only makes sense to change the tensor size if you are working with a different dataset with a different number of classes. In this case you would have to change it by setting the new number of classes on the cfg file 'class_lay' parameter.