Closed Zora137 closed 6 months ago
Hi, the results are too bad. As stated in our paper:
For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.
Could you please try using a larger initial learning rate?
Hi, the results are too bad. As stated in our paper:
For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.
Could you please try using a larger initial learning rate?
hi nice work !!! can you show me your args file ,I do many times the ruseltalways this
trained by lite-mono
Hi, the results are too bad. As stated in our paper:
For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.
Could you please try using a larger initial learning rate?
hi nice work !!! can you show me your args file ,I do many times the ruseltalways this
trained by lite-mono
Hi, you can try setting the learning rate to --lr 0.0001 5e-6 16 0.0001 1e-5 16
. drop_path
can be set to 0.3
. But this might cause your training not converging. Please make sure you are using the same dependencies as we used. https://github.com/noahzn/Lite-Mono/issues/58
Also, please check the results of each epoch, not only the last epoch. The best result should be achieved at an earlier epoch.
嗨,结果太糟糕了。正如我们的论文所述:
对于从头开始训练的模型,采用余弦学习率计划[26]的初始学习率为5e−4,训练周期设置为35。
你能尝试使用更大的初始学习率吗?
嗨,干得好!!你能给我看看你的args文件吗,我做了很多次ruselt总是这个 由 Lite-Mono 训练
嗨,您可以尝试将学习率设置为。 可以设置为 。但这可能会导致您的训练无法收敛。请确保您使用的依赖项与我们使用的依赖项相同。#58
--lr 0.0001 5e-6 16 0.0001 1e-5 16``drop_path``0.3
另外,请检查每个纪元的结果,而不仅仅是最后一个纪元。最好的结果应该在更早的时代实现。 ok thanks!!! i will try it
hello,sorry to bother you many times this is the result I run your code by this command without pretain
It cannot reach the resulin your papert without pretain:
is this normal? or did dosomething wrong? expect your answer,thank you so much