wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
664 stars 112 forks source link

加载checkpoint失败 #209

Closed axuan731 closed 10 months ago

axuan731 commented 10 months ago

您好: 感谢您的贡献,想请教一些关于训练的问题。 我在训练到46轮的时候停止了,这个时候准确率是18% 1 然后我尝试从第47轮开始训练,这个时候准确率下降非常多(包括之前加载准确率60%的模型,却变成了40%) 2 并且这个模型不能更新到下一轮,会因为超时报错 3 我是在stage3的时候加入命令行 --checkpoint /.../model_46.pt 请问这个问题出在哪里了?

此外,在LMF的阶段,也会出现90%准确率变成70%的情况,但是依然可以更新到下一轮,EER也会有所下降。 十分感谢您的帮助~

JiJiJiang commented 10 months ago

Acc 18% at epoch 46 is too low. What is the Acc after epoch 20?

axuan731 commented 10 months ago

Acc 18% at epoch 46 is too low. What is the Acc after epoch 20?

您好,epoch 20开始的准确率是62,epoch 20结束的准确率是46,epoch 21开始的准确率是40,epoch28的准确率最低降到了8%

JiJiJiang commented 10 months ago

The accuracys are a little bit low. I guess you should have trained the model with a large mount of training data, as your largest batch id is 16036. Please check your training config first, e.g., batch size, warmup epoch number, etc. The default setup may be not appropriate for your training data.

axuan731 commented 10 months ago

The accuracys are a little bit low. I guess you should have trained the model with a large mount of training data, as your largest batch id is 16036. Please check your training config first, e.g., batch size, warmup epoch number, etc. The default setup may be not appropriate for your training data.

感谢回答~