SimCLRv2 experiments - Githubissues

mmatena commented 3 years ago

Note: I should also try some experiments with supervised ImageNet pretrained models. A lot of the stuff here will be the same as for them, but I'll just focus on SimCLR stuff here.

mmatena commented 3 years ago

Notes

I'll use models without the selective kernels as the K-FAC-style convolutional Fisher is probably going to be easier to apply without them.
I'm going to use the tasks caltech101, cifar100, dtd, and oxford_iiit_pet since SimCLR showed a significant boost over random initialization on those tasks.
- I'm skipping VOC 2007 as it is an object detection task and skipping SUN 397 as it is large (36 GB) download.
I'll use a learning rate of 1e-3. The SimCLR paper did a grid of 7 logarithmically spaced learning rates between 1e-4 and 1e-1.
- I chose 1e-3 as a first pass. I should examine training and see if we should lower or raise it.
Paper did a batch of 256 for 20k steps with using SGD with Nesterov momentum with a momentum parameter of 0.9. They set the momentum parameter for the batch normalization statistics to max(1 − 10/s, 0.9) where s is the number of steps per epoch.
- I'll ignore the batch normalization momentum stuff at least in the first pass and try to match the rest of their settings.
- I'll use a batch size of 32 for memory reasons. I'll also use Adam as I am more familiar with it. I'll train for 80k steps, which means I'll see half the total number of examples that I would with the original params.
After a first pass and examining the performance, I'll be using the checkpoints from 20k steps in for merging.

mmatena commented 3 years ago

First pass

It looks like I was just fine-tuning the head weights and not the body for the first pass. I think it's because I was setting trainable=False because I was otherwise getting gradient not found errors.

I might hold off a bit until I get a working SimCLR implementation.

Also look at the learning rate and regularization strengths that I chose.

mmatena commented 3 years ago

Second pass

I'm calling the second pass the one after I shifted it from frozen-body fine-tuning.

I'll go for 30k steps and probably a learning rate of 1e-4.
- The learning rate is fairly arbitrary.
I might also want to adjust the weight decay strengths as it appears the higher ones significantly affect training.

mmatena / m251

SimCLRv2 experiments #8

Notes

First pass

Second pass