wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
707 stars 116 forks source link

[feature] add redimnet #346

Closed wsstriving closed 2 months ago

wsstriving commented 2 months ago

Add ReDimNet as mentioned in https://github.com/wenet-e2e/wespeaker/issues/341

wsstriving commented 2 months ago

I have actually validated the performance of B2 and B3, the performance are acceptable, but it's hard to validate other bigger models due to the resource limitation ...

czl66 commented 2 months ago

Hi, I was running redimnetb2 with 8x4090 on cnceleb, using your model and config (06025169e6a32cb59e7c9bcdcd74ee4ba321aa93), but my results seem not so good, the results are:

企业微信截图_d5a0bc07-cdc3-4215-ac85-99d75e30e187

The final acc and final loss are comparable,

企业微信截图_182cf4c2-ad12-44a7-a228-5c0a03ff0a3d

and the difference of two exp's config is listed as below:

企业微信截图_04f43b10-2d31-4f43-a193-3528ffadf35e

Is this reasonable? Thanks for your reply。

wsstriving commented 2 months ago

Unfortunately, I am currently too busy to tune the experiments. I will present the results I have so far below. I believe we can first merge the implementations, as the performance is acceptable given the model's size. I would greatly appreciate any assistance from someone with sufficient resources to help with this implementation.

Main Problems:

The default SphereFace2 loss does not work well for me (it yields worse results than the arc_margin-based approaches). While large margin fine-tuning is beneficial, its impact is limited.

Redimnet v0 arc_margin No score norm AS-Norm
EER minDCF EER minDCF
vox1_O 1.244 1.181 1.128 0.109
vox1_E 1.248 2.008 1.181 0.125
vox1_H 2.172 0.210 2.008 0.186

Redimnet v2 arc_margin No score norm AS-Norm
EER minDCF EER minDCF
vox1_O 0.712 0.059 0.718 0.072
vox1_E 0.894 0.095 0.869 0.094
vox1_H 1.621 0.161 1.545 0.149

Redimnet v3 arc_margin No score norm AS-Norm
EER minDCF EER minDCF
vox1_O 0.574 0.045 0.537 0.045
vox1_E 0.822 0.095 0.790 0.089
vox1_H 1.576 0.157 1.433 0.140
didi1233 commented 2 days ago

@wsstriving Hello! Recently, I have been working on reproducing the models from your code and the updated code from the subsequent work of the paper's authors. If the training goes smoothly, I will provide the necessary assistance. My question is: are the results from your code before or after LMF?