Closed b03505036 closed 5 years ago
Hi Ken
Regarding learning-rate scheduling, yes you should apply linear learning-rate rule according to the changed batch size. However, with linear learning-rate rule in my experiments where I resized input size and changed total iterations for fast prototyping, I found it led to a divergence for the FSAF model. To remedy this issue, I tried default learning-rate and it produced stable training. Thus, I used default learning-rate for both baseline and FSAF.
Additionally, for the original setting where 8 GPUs and larger input size are used, linear learning-rate scheduling might "not" lead to the divergence. But, I didn't check it due to my limited resources.
hi, ty for ur sharing. It's very helpful. I have some question with lr. I saw ur GPU_number is 4, num_images_per_GPU is 8. And lr rate is 0.01. But according to the original mmdetection setting, shouldn't it to be 0.01*2? (mmdetection GPU=8,num_images_per_GPU=2 lr is 0.01)