I noticed in your paper the training epoch is 20 and warmup is 5 epochs, but when I tried to reproduce the results with train.sh, it runs for 100 epochs with warmup_epochs set to 10,
Another disparity is the weight_decay, the paper indicate the end is 0.1 while the code use 0.4.
Do I need to change the code to get it aligned with the paper to reproduce the results ?
Hi,
I noticed in your paper the training epoch is 20 and warmup is 5 epochs, but when I tried to reproduce the results with train.sh, it runs for 100 epochs with warmup_epochs set to 10,
Another disparity is the weight_decay, the paper indicate the end is 0.1 while the code use 0.4.
Do I need to change the code to get it aligned with the paper to reproduce the results ?
Thanks!.