Training hyperparameters for AugMix ResNet?

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

https://huggingface.co/docs/timm

Apache License 2.0

31.37k stars 4.7k forks source link

Training hyperparameters for AugMix ResNet? #78

Closed njuerect closed 4 years ago

njuerect commented 4 years ago

Hi，Thanks for this great work! I'd like to reproduce your results with ResNet50 using AugMix feature. Could you sharing the training hyperparameters? Besides, any data augmentation usage recommendation (along with AugMix) is appreciated.

rwightman commented 4 years ago

@njuerect I intended to have them posted by now, but I'm currently changing some hardware on the machine I used for the training, will grab the details when it's back up later this week

njuerect commented 4 years ago

Thanks for your reply！Waiting for your commit！ Besides，after investigating the AugMix paper, i'm tring to use data augment setting（augmix with random erase） and jsd loss to train ResNet。some details： augmix-m5-w5-d2-a1 reprob 0.5 remode pixel jsd loss aug-splits : 2

rwightman commented 4 years ago

I can provide a bit more help wrt your current hparams, without seeing the details, I know I used: augmix-m1-mstd0.5 (I started with stronger but results didn't work out so I went back to the value suggested by original authors, I have something running with m3 now)

I did use random erasing with my 79% run, I think that was a big part of the success. My reprob was 0.3 (running test with .4% now) and mode was also pixel but I set --recount 3

JSD loss should be enabled, yes, but aug-splits should be 3 to match the paper (1 clean + 2 aug)

Keep in mind with aug-splits active, your effective batch size ends up being 3x your -b argument, but I found it works best to set the learning rate based on the non-multiplied batch size. For LR scheduling I was using my typical for ResNets, cosine + SGD... I'd set the epoch count higher than usual, I did 180 in the first pass, but then did a resume from best result on epoch 178 and went to 200 epochs and it was the result I used.

njuerect commented 4 years ago

Thanks very much for your sharing，that will be very helpful。 Because i haven't too much gpu resources，setting aug-splits to 3 makes batch size to be 1/3 of the original value。The training speed is very slow。 And，yes，i‘m using consine+SGD for LR Scheduling，sonetimes cyclical cosine annealing。 I will try your suggestion，Thanks again！

rwightman commented 4 years ago

So, still having issues with the machine for the original experiment, will get them eventually as a reference. I had some results from another experiment finish this week, slightly better than the original, not statistically significant on ImageNet validation (79.038 vs 78.994), but the performance on other test sets like ImageNetV2, ImageNet-Sketch, etc is better.

Command line in the README https://github.com/rwightman/pytorch-image-models#resnet50-with-jsd-loss-and-randaugment-clean--2x-ra-augs---7904-top-1-9439-top-5