Closed hkchengrex closed 4 years ago
Hi @hkchengrex, thanks for pointing out my mistake in the previous answer. In fact, pre-train runs for 2M samples not iterations. So it is about 500K iterations as a batch. In paper, we roughly report the training time without an accurate measuring of time. If it cause you misunderstanding, I am sorry about that. You are right that pretrain take twice more time than fine-tuning. In our implementation, 260 epochs for FT is sufficient as we regularly reduce LR.
Thank you for the explanation!
Hi, thanks for your code and work.
I read on another issue #6 that the main training runs for 260 epochs with 3771 samples per epoch. That should be 260*3771/4(batch size) ~ 240K iterations while pretraining runs for 2M iterations. Why would it take just 4 days for pretraining but 3 days for main training as mentioned in the paper, given that each iteration should approximately take the same amount of time?
Am I missing something? I am trying to re-train the network but 260 epochs seem insufficient. Thanks a lot!