modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

EResNet result on VoxCeleb is not comparable #38

Closed penguinwang96825 closed 10 months ago

penguinwang96825 commented 10 months ago

I ran the exact same script for the EResNet experiment on VoxCeleb. The EER and minDCF I got is 1.0105 and 0.1146, which is not comparable to the paper. The only difference is that I trained the model on 4 A100 machines, but I doubt that is the reason behind. Can you please provide the train.log and train_epoch.log files?

I also notice that in prepare_data_csv.csv, the default segment duration is 4 seconds, but in conf/eres2net.yaml it's 3 seconds. May I ask why is that?

yfchenlucky commented 10 months ago

train_epcoch.log files: epoch: 1 - train Avg_loss: 6.68, train Avg_acc: 15.58, train Lr_value: 4.00e-02 epoch: 2 - train Avg_loss: 2.07, train Avg_acc: 62.32, train Lr_value: 8.00e-02 epoch: 3 - train Avg_loss: 1.38, train Avg_acc: 75.12, train Lr_value: 1.20e-01 epoch: 4 - train Avg_loss: 1.19, train Avg_acc: 78.82, train Lr_value: 1.60e-01 epoch: 5 - train Avg_loss: 1.13, train Avg_acc: 80.03, train Lr_value: 2.00e-01 epoch: 6 - train Avg_loss: 1.07, train Avg_acc: 81.41, train Lr_value: 2.00e-01 epoch: 7 - train Avg_loss: 1.01, train Avg_acc: 82.61, train Lr_value: 2.00e-01 epoch: 8 - train Avg_loss: 9.65e-01, train Avg_acc: 83.32, train Lr_value: 1.99e-01 epoch: 9 - train Avg_loss: 9.32e-01, train Avg_acc: 83.95, train Lr_value: 1.98e-01 epoch: 10 - train Avg_loss: 9.09e-01, train Avg_acc: 84.43, train Lr_value: 1.97e-01 epoch: 11 - train Avg_loss: 8.86e-01, train Avg_acc: 84.84, train Lr_value: 1.96e-01 epoch: 12 - train Avg_loss: 8.79e-01, train Avg_acc: 84.96, train Lr_value: 1.94e-01 epoch: 13 - train Avg_loss: 8.61e-01, train Avg_acc: 85.27, train Lr_value: 1.93e-01 epoch: 14 - train Avg_loss: 8.48e-01, train Avg_acc: 85.57, train Lr_value: 1.91e-01 epoch: 15 - train Avg_loss: 8.36e-01, train Avg_acc: 85.86, train Lr_value: 1.89e-01 epoch: 16 - train Avg_loss: 8.23e-01, train Avg_acc: 86.08, train Lr_value: 1.86e-01 epoch: 17 - train Avg_loss: 8.13e-01, train Avg_acc: 86.17, train Lr_value: 1.84e-01 epoch: 18 - train Avg_loss: 8.01e-01, train Avg_acc: 86.48, train Lr_value: 1.81e-01 epoch: 19 - train Avg_loss: 7.94e-01, train Avg_acc: 86.55, train Lr_value: 1.78e-01 epoch: 20 - train Avg_loss: 7.81e-01, train Avg_acc: 86.78, train Lr_value: 1.75e-01 epoch: 21 - train Avg_loss: 1.06, train Avg_acc: 86.95, train Lr_value: 1.72e-01 epoch: 22 - train Avg_loss: 1.56, train Avg_acc: 88.16, train Lr_value: 1.68e-01 epoch: 23 - train Avg_loss: 2.04, train Avg_acc: 88.79, train Lr_value: 1.64e-01 epoch: 24 - train Avg_loss: 2.47, train Avg_acc: 89.23, train Lr_value: 1.61e-01 epoch: 25 - train Avg_loss: 2.87, train Avg_acc: 89.42, train Lr_value: 1.57e-01 epoch: 26 - train Avg_loss: 3.18, train Avg_acc: 89.68, train Lr_value: 1.53e-01 epoch: 27 - train Avg_loss: 3.43, train Avg_acc: 89.81, train Lr_value: 1.49e-01 epoch: 28 - train Avg_loss: 3.65, train Avg_acc: 89.99, train Lr_value: 1.44e-01 epoch: 29 - train Avg_loss: 3.83, train Avg_acc: 89.99, train Lr_value: 1.40e-01 epoch: 30 - train Avg_loss: 3.95, train Avg_acc: 90.09, train Lr_value: 1.35e-01 epoch: 31 - train Avg_loss: 4.02, train Avg_acc: 90.37, train Lr_value: 1.31e-01 epoch: 32 - train Avg_loss: 4.08, train Avg_acc: 90.52, train Lr_value: 1.26e-01 epoch: 33 - train Avg_loss: 4.10, train Avg_acc: 90.71, train Lr_value: 1.22e-01 epoch: 34 - train Avg_loss: 4.14, train Avg_acc: 90.73, train Lr_value: 1.17e-01 epoch: 35 - train Avg_loss: 4.13, train Avg_acc: 90.92, train Lr_value: 1.12e-01 epoch: 36 - train Avg_loss: 4.11, train Avg_acc: 91.15, train Lr_value: 1.07e-01 epoch: 37 - train Avg_loss: 4.10, train Avg_acc: 91.27, train Lr_value: 1.02e-01 epoch: 38 - train Avg_loss: 4.06, train Avg_acc: 91.43, train Lr_value: 9.76e-02 epoch: 39 - train Avg_loss: 4.05, train Avg_acc: 91.53, train Lr_value: 9.28e-02 epoch: 40 - train Avg_loss: 3.98, train Avg_acc: 91.76, train Lr_value: 8.80e-02 epoch: 41 - train Avg_loss: 3.95, train Avg_acc: 91.93, train Lr_value: 8.32e-02 epoch: 42 - train Avg_loss: 3.89, train Avg_acc: 92.10, train Lr_value: 7.85e-02 epoch: 43 - train Avg_loss: 3.85, train Avg_acc: 92.30, train Lr_value: 7.38e-02 epoch: 44 - train Avg_loss: 3.81, train Avg_acc: 92.33, train Lr_value: 6.91e-02 epoch: 45 - train Avg_loss: 3.73, train Avg_acc: 92.60, train Lr_value: 6.46e-02 epoch: 46 - train Avg_loss: 3.68, train Avg_acc: 92.72, train Lr_value: 6.01e-02 epoch: 47 - train Avg_loss: 3.61, train Avg_acc: 92.96, train Lr_value: 5.57e-02 epoch: 48 - train Avg_loss: 3.55, train Avg_acc: 93.08, train Lr_value: 5.14e-02 epoch: 49 - train Avg_loss: 3.49, train Avg_acc: 93.19, train Lr_value: 4.73e-02 epoch: 50 - train Avg_loss: 3.40, train Avg_acc: 93.51, train Lr_value: 4.32e-02 epoch: 51 - train Avg_loss: 3.33, train Avg_acc: 93.69, train Lr_value: 3.93e-02 epoch: 52 - train Avg_loss: 3.27, train Avg_acc: 93.80, train Lr_value: 3.56e-02 epoch: 53 - train Avg_loss: 3.21, train Avg_acc: 94.03, train Lr_value: 3.19e-02 epoch: 54 - train Avg_loss: 3.13, train Avg_acc: 94.12, train Lr_value: 2.85e-02 epoch: 55 - train Avg_loss: 3.06, train Avg_acc: 94.32, train Lr_value: 2.52e-02 epoch: 56 - train Avg_loss: 2.99, train Avg_acc: 94.48, train Lr_value: 2.21e-02 epoch: 57 - train Avg_loss: 2.90, train Avg_acc: 94.71, train Lr_value: 1.91e-02 epoch: 58 - train Avg_loss: 2.83, train Avg_acc: 94.90, train Lr_value: 1.64e-02 epoch: 59 - train Avg_loss: 2.75, train Avg_acc: 95.00, train Lr_value: 1.39e-02 epoch: 60 - train Avg_loss: 2.66, train Avg_acc: 95.16, train Lr_value: 1.15e-02 epoch: 61 - train Avg_loss: 2.61, train Avg_acc: 95.32, train Lr_value: 9.36e-03 epoch: 62 - train Avg_loss: 2.53, train Avg_acc: 95.52, train Lr_value: 7.43e-03 epoch: 63 - train Avg_loss: 2.43, train Avg_acc: 95.78, train Lr_value: 5.72e-03 epoch: 64 - train Avg_loss: 2.38, train Avg_acc: 95.80, train Lr_value: 4.22e-03 epoch: 65 - train Avg_loss: 2.32, train Avg_acc: 95.89, train Lr_value: 2.96e-03 epoch: 66 - train Avg_loss: 2.26, train Avg_acc: 96.04, train Lr_value: 1.91e-03 epoch: 67 - train Avg_loss: 2.22, train Avg_acc: 96.14, train Lr_value: 1.10e-03 epoch: 68 - train Avg_loss: 2.17, train Avg_acc: 96.24, train Lr_value: 5.17e-04 epoch: 69 - train Avg_loss: 2.15, train Avg_acc: 96.25, train Lr_value: 1.67e-04 epoch: 70 - train Avg_loss: 2.14, train Avg_acc: 96.24, train Lr_value: 5.00e-05

  1. you need to ensure the experimental configurations are the same (batch_size, learning rate and so on).
  2. The different training environments can lead to different convergence of the network. You can experiment by slightly adjusting the parameters. Generally, an error of less than 5% is acceptable.
  3. In prepare_data_csv.csv, the default segment duration can be regarded as hyperparameters, and you can change it to test.
penguinwang96825 commented 10 months ago

Thanks for the prompt reply!

@yfchenlucky I didn't change any configurations from the yaml file. At epoch 70, my training accuracy is 95.88%, I guess I just need to further optimise the model to get a similar result as yours.