speechbrain / speechbrain

A PyTorch-based Speech Toolkit
http://speechbrain.github.io
Apache License 2.0
8.8k stars 1.39k forks source link

ECAPA-TDNN validation error #980

Closed zhaoyiming closed 3 years ago

zhaoyiming commented 3 years ago

I find some errors in Voxceleb training experiments.

  1. The valid error rate is 1 from the 11th epoch. 2.epoch: 1, lr: 7.46e-04 - train loss: 5.50 - valid loss: 1.56, valid ErrorRate: 4.64e-02 epoch: 2, lr: 9.76e-04 - train loss: 2.11 - valid loss: 1.16, valid ErrorRate: 3.29e-02 epoch: 3, lr: 2.30e-04 - train loss: 1.75 - valid loss: 6.52e-01, valid ErrorRate: 2.00e-02 epoch: 4, lr: 5.17e-04 - train loss: 1.72 - valid loss: 8.21e-01, valid ErrorRate: 2.40e-02 epoch: 5, lr: 7.37e-04 - train loss: 1.31 - valid loss: 7.39e-01, valid ErrorRate: 2.12e-02 epoch: 6, lr: 9.15e-06 - train loss: 1.53 - valid loss: 5.39e-01, valid ErrorRate: 1.75e-02 epoch: 7, lr: 7.55e-04 - train loss: 1.36 - valid loss: 8.14e-01, valid ErrorRate: 2.36e-02 epoch: 8, lr: 4.98e-04 - train loss: 1.18 - valid loss: 5.41e-01, valid ErrorRate: 1.74e-02 epoch: 9, lr: 2.48e-04 - train loss: 1.43 - valid loss: 5.60e-01, valid ErrorRate: 1.78e-02 epoch: 10, lr: 9.94e-04 - train loss: 1.13 - valid loss: 7.33e-01, valid ErrorRate: 2.17e-02 epoch: 11, lr: 7.46e-04 - train loss: 1.33 - valid loss: 0.00e+00, valid ErrorRate: 1.00e+00

  2. The valid ErrorRate is 1 from the first epoch. epoch: 1, lr: 1.18e-06 - train loss: 17.07 - valid loss: 17.32, valid ErrorRate: 1.00e+00 epoch: 2, lr: 1.18e-06 - train loss: 16.90 - valid loss: 17.41, valid ErrorRate: 1.00e+00 epoch: 3, lr: 1.18e-06 - train loss: 17.55 - valid loss: 6.51e-01, valid ErrorRate: 1.00e+00 epoch: 4, lr: 2.36e-06 - train loss: 17.57 - valid loss: 6.73e-01, valid ErrorRate: 1.00e+00 epoch: 5, lr: 3.55e-06 - train loss: 17.41 - valid loss: 6.73e-01, valid ErrorRate: 1.00e+00 epoch: 6, lr: 4.73e-06 - train loss: 17.38 - valid loss: 6.71e-01, valid ErrorRate: 1.00e+00 epoch: 7, lr: 5.92e-06 - train loss: 17.44 - valid loss: 5.93e-01, valid ErrorRate: 1.00e+00 epoch: 8, lr: 7.10e-06 - train loss: 17.29 - valid loss: 6.25e-01, valid ErrorRate: 1.00e+00 epoch: 9, lr: 8.29e-06 - train loss: 17.25 - valid loss: 6.51e-01, valid ErrorRate: 1.00e+00 epoch: 10, lr: 9.47e-06 - train loss: 17.41 - valid loss: 6.73e-01, valid ErrorRate: 1.00e+00 epoch: 11, lr: 1.07e-05 - train loss: 17.06 - valid loss: 5.28e-01, valid ErrorRate: 1.00e+00 epoch: 12, lr: 1.18e-05 - train loss: 16.85 - valid loss: 6.00e-01, valid ErrorRate: 1.00e+00 epoch: 13, lr: 1.30e-05 - train loss: 16.78 - valid loss: 5.75e-01, valid ErrorRate: 1.00e+00 epoch: 14, lr: 1.42e-05 - train loss: 16.71 - valid loss: 5.24e-01, valid ErrorRate: 1.00e+00 epoch: 15, lr: 1.54e-05 - train loss: 16.62 - valid loss: 6.36e-01, valid ErrorRate: 1.00e+00 epoch: 16, lr: 1.66e-05 - train loss: 16.41 - valid loss: 5.51e-01, valid ErrorRate: 1.00e+00 epoch: 17, lr: 1.78e-05 - train loss: 16.24 - valid loss: 5.23e-01, valid ErrorRate: 1.00e+00 epoch: 18, lr: 1.89e-05 - train loss: 16.23 - valid loss: 6.17e-01, valid ErrorRate: 1.00e+00 epoch: 19, lr: 2.01e-05 - train loss: 15.86 - valid loss: 6.53e-01, valid ErrorRate: 1.00e+00 epoch: 20, lr: 2.13e-05 - train loss: 15.83 - valid loss: 5.02e-01, valid ErrorRate: 1.00e+00

Maybe my error dev dataset? The save checkpoints are normal on the test dataset. The wrong results are random.

Here are some other questions: We get 0.86% EER on veri_test(not clean), 0.79%EER on veri_test(s-norm), 0.69% EER on veri_test2(clean), and 0.63%EER on veri_test2(s-norm) by provided model, it's right?

Thank you!

mravanelli commented 3 years ago

The validation error decreases very fast of voxceleb data. Are you using voxceleb as well? It might be here a kind of label mismatch for dev data. Yes, I got 0.69% EER on veri_test2(clean). I don't remember on the other subsets.

zhaoyiming commented 3 years ago

The validation error decreases very fast of voxceleb data. Are you using voxceleb as well? It might be here a kind of label mismatch for dev data. Yes, I got 0.69% EER on veri_test2(clean). I don't remember on the other subsets.

Yes, vox1+2. Because of this strange problem, I am temporarily using the dev you provided. I'm looking for what went wrong. The second question is ok. Thanks!

dragen1860 commented 3 years ago

@zhaoyiming Hi, I am also training on VoxCeleb1+2. Can we keep in touch via wechat? here is mine: dragen1860 . thank you.