Implementation details for pretraining with VICReg on the 1000-classes ImagetNet dataset without
labels are as follows. Coefficients λ and μ are 25 and ν is 1 in Eq. (6), and is 0.0001 in Eq. (1).
We give more details on how we choose the coefficients of the loss function in Appendix C.3. The
encoder network fθ is a standard ResNet-50 backbone He et al. (2016) with 2048 output units. The
expander hφ is composed of two fully-connected layers with batch normalization (BN) Ioffe &
Szegedy (2015) and ReLU, and a third linear layer. The sizes of all 3 layers were set to 8192. As
with Barlow Twins, performance improves when the size of the expander layers is larger than the
dimension of the representation. The impact of the expander dimension on performance is studied in
Appendix C.5. The training protocol follows those of BYOL and Barlow Twins: LARS optimizer You
et al. (2017); Goyal et al. (2017) run for 1000 epochs with a weight decay of 10−6 and a learning
rate lr = batch_size/256 × base_lr, where batch_size is set to 2048 by default and base_lr is a
base learning rate set to 0.2. The learning rate follows a cosine decay schedule Loshchilov & Hutter
(2017), starting from 0 with 10 warmup epochs and with final value of 0.002.
Base vicreg experiment config