imbue-ai / self_supervised

A Pytorch-Lightning implementation of self-supervised algorithms
MIT License
535 stars 52 forks source link

Learning rate scaling? #1

Closed sachit-menon closed 4 years ago

sachit-menon commented 4 years ago

Hi, thanks for this great exploration of BYOL! I have a (perhaps mundane) question about the implementation here; you note in the README that

(the batch_size and lr differ from the moco documentation due to the way Pytorch-Lightning handles multi-gpu training in ddp -- the effective numbers are batch_size=256 and lr=0.03)

I understand that in the official MoCo code, they manually scale batch_size to be batch_size/n_gpus when using ddp (https://github.com/facebookresearch/moco/blob/master/main_moco.py#L174 for reference). So batch_size=32 makes sense to me, as Lightning's ddp wraps nn.parallel.

However, I don't really understand the change in lr - could you explain why you scale to lr*n_gpus? The MoCo example doesn't seem to do this scaling, so I'm wondering what about Lightning results in needing the change. Any input would be really appreciated!

abefetterman commented 4 years ago

Thanks for the report @sachit-menon !

It looks like this lr change was related to a specific to an earlier version of pytorch-lightning we used for the original MoCo runs (0.7.1). Because this has been fixed in more recent versions, the lr should be the same as recommended (lr=0.03). I've updated the documentation. Thanks again!