Number of iterations in main training

hkchengrex commented 4 years ago

Hi, thanks for your code and work.

I read on another issue #6 that the main training runs for 260 epochs with 3771 samples per epoch. That should be 260*3771/4(batch size) ~ 240K iterations while pretraining runs for 2M iterations. Why would it take just 4 days for pretraining but 3 days for main training as mentioned in the paper, given that each iteration should approximately take the same amount of time?

Am I missing something? I am trying to re-train the network but 260 epochs seem insufficient. Thanks a lot!

seoungwugoh commented 4 years ago

Hi @hkchengrex, thanks for pointing out my mistake in the previous answer. In fact, pre-train runs for 2M samples not iterations. So it is about 500K iterations as a batch. In paper, we roughly report the training time without an accurate measuring of time. If it cause you misunderstanding, I am sorry about that. You are right that pretrain take twice more time than fine-tuning. In our implementation, 260 epochs for FT is sufficient as we regularly reduce LR.

hkchengrex commented 4 years ago

Thank you for the explanation!

seoungwugoh / STM

Number of iterations in main training #20