Could I use this trick in normal model training?

Hi,

Thanks for providing this great paper and code. I am trying to use the method you proposed in normal classification task with dataset such as cifar-10. To make sure that I have understand the paper correctly, I feel I better ask you for some guidance:

Suppose my baseline cifar-10 classification model is WideResnet28-1, and I use batch size of 256 with cosine annealing lr scheduler. The initial learning rate is thus 0.2. The augmentation method is horizontal flip and random cropping after padded 4 pixels. Apart from these normal settings, I also used mixup to train the model.

The question is: what is the most suitable way to add self-supervision to the above training procedure? Here is my assumption: I should add a new 4-way classification fc layer head in parallel with the 10-way classification head to the model. The total loss thus should become L_10 + 0.5*L_4 according to the paper. As for the dataset, I first implement normal h-flip and random cropping augmentation, and then rotate the cropped and flipped image in (0, 90, 180, 360) to make the batch size to be 2564=1024. Since the batch size is amplified, I should also amplify the learning rate to 0.24=0.8. As for the mixup part, I should mix the 10-way classification labels as well as the 4-way rotation labels and then use cross entropy to compute the loss respectively.

Is this the correct way to use your method in normal classification?

hendrycks / ss-ood

Could I use this trick in normal model training? #1