Open Whale-ice opened 1 year ago
Translated above version for future reference : "Hello, Thanks for the excellent method, I have a few questions: The experimental details in the paper say that the learning rate in the semi-supervised phase is constant 0.001, and the teacher model is updated every 25 epochs: Then the student is trained for 25 to 75 epochs depending on the amount of unlabeled data with learning rate 0.001, and the teacher is updated every 25 epochs. May I ask why you don't use a onecycle learning rate strategy, and the teacher model for small data sets is no longer updated during the training process (because small data sets only need to be trained for 25 epochs)? Looking forward to your answer, thank you"
您好, 感谢你提出的优秀的方法,我有一些小疑问: 论文中实验细节部分说半监督阶段学习率是恒定的0.001,并且教师模型每25个epoch更新一次:Then the student is trained for 25 to 75 epochs depending on the amount of unlabeled data with learning rate 0.001, and the teacher is updated every 25 epochs. 请问为什么不采用onecycle类型的学习率策略,并且对于small数据集教师模型在训练过程中就不再更新么(因为small数据集只需要训练25epochs)? 期待你的回答,谢谢