Closed sachit-menon closed 4 years ago
Thanks for the report @sachit-menon !
It looks like this lr change was related to a specific to an earlier version of pytorch-lightning
we used for the original MoCo runs (0.7.1). Because this has been fixed in more recent versions, the lr should be the same as recommended (lr=0.03
). I've updated the documentation. Thanks again!
Hi, thanks for this great exploration of BYOL! I have a (perhaps mundane) question about the implementation here; you note in the README that
I understand that in the official MoCo code, they manually scale
batch_size
to bebatch_size/n_gpus
when using ddp (https://github.com/facebookresearch/moco/blob/master/main_moco.py#L174 for reference). Sobatch_size=32
makes sense to me, as Lightning's ddp wrapsnn.parallel
.However, I don't really understand the change in
lr
- could you explain why you scale tolr*n_gpus
? The MoCo example doesn't seem to do this scaling, so I'm wondering what about Lightning results in needing the change. Any input would be really appreciated!