lucidrains / byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch
MIT License
1.73k stars 246 forks source link

BYOL collapses #44

Open mark-selyaeff opened 3 years ago

mark-selyaeff commented 3 years ago

Has anybody experienced collapse while training BYOL? After training 3 epochs representations with ResNet50, about 80% of scalars in the representation vector are zeros and loss is below 0.01. Details: I'm using BYOL with momentum, batch size 256, while accumulating gradients for 4096/256=16 consecutive steps. Optimizer is Adam with LR=0.2, as mentioned in the paper.

lucidrains commented 3 years ago

@mark-selyaeff Hey Mark! So the LR in the paper is actually specific to LARS I believe

For Adam, try using a smaller learning rate (3e-4)

theblackcat102 commented 3 years ago

I have successfully train using Adam (5e-3) with the scaling trick : actual learning rate = ( your learning rate ) * ( total batch size ) / 256 However, I encounter collapse when I try replacing resnet for other types of backbone ( EfficientNet-B0, B1, MobileNet v2, v3, shufflenet v2, v3 ). Lowering learning rate works for MobileNet series and shufflenet v2 however failed for EfficientNet-B0 and shufflenet v3.