modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

How to compute the ERes2Net model param? #55

Closed JiJiJiang closed 7 months ago

JiJiJiang commented 7 months ago

Hello, I use the same model params as your configs in https://github.com/alibaba-damo-academy/3D-Speaker/blob/6f6ed3189a4d1db040586a518c8e5d80f4fc0665/egs/3dspeaker/sv-eres2net/conf/eres2net.yaml, but I get 9.88M. (Yours is 4.6M)

Here is the way I compute the model params: image I'm wondering where the difference is ?

JiJiJiang commented 7 months ago

Even I set embedding_size=192, I still got 6.61 M.

yfchenlucky commented 7 months ago

The key difference arises from how we compute the model parameters. Since the classifier isn't used during inference, it's not factored into the statistical calculation of the model parameters.

JiJiJiang commented 7 months ago

Thank you for your answer! But what part of the ERes2Net model is the classifier you mean? Is it the output linear layer mapping the embedding into the speaker label? However, it is not defined in the model.

yfchenlucky commented 7 months ago

Yes, the classifier refers to the output linear layer mapping the embedding into the speaker label. Therefore, since these parameters will be discarded during inference, they are not factored into the model parameter calculations.

JiJiJiang commented 7 months ago

Thank you for your answer. I directly initialize the ERes2Net model as defined in ResNet.py, which does not contain the classifier as you mention above. The code lines in the screenshot are directly appended in the end of your ResNet.py and run python ResNet.py. So I think my calculation result should be consistent with yours. What is wrong with my codes?

It would be nice if you can share the codes you calculate the model parameters. Thanks so much!

yfchenlucky commented 7 months ago

Apologies for my oversight, I overlooked the parameters following the statistical pooling layer. With an embedding size of 192, the model parameters total 6.61M. When the embedding size is 512, the model parameters amount to 9.88M. I'll update this on arXiv paper and GitHub soon. Thank you very much for reminding.