huawei-noah / Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
4.05k stars 707 forks source link

Training hyperparams on ImageNet #53

Closed seanzhuh closed 3 years ago

seanzhuh commented 3 years ago

Hi, thanks for sharing such a wonderful work, I'd like to reproduce your results on ImageNet, could you please specify training parameters such as initial learning rate, how to decay it, batch size, etc. It would be even better if you can provide tricks to train GhostNet, such as label smoothing and data augmentation. Thx!

iamhankai commented 3 years ago

Please refer to the training settings in https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/tinynet

seanzhuh commented 3 years ago

Hi, thanks for your quick reply! To my best knowledge, I know nothing about your proposed TinyNet, here's the best parameters I can get from reading your source code for GhostNet:

Note: the following are based on PyTorch API

  1. For data augmentation you use: training: torchvision.transforms.RandomResizedCrop((224, 224), scale=(0.08, 1.), ratio=(3./4., 4./3.), interpolation=Image.BICUBIC) torchvision.transforms.RandomHorizontalFlip() torchvision.transforms.ColorJitter(brightness=(0.6, 1.4), contrast=(0.6, 1.4), saturation=(0.6, 1.4)) torchvision.transforms.ToTensor() torchvision.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)) validation: torchvision.transforms.Resize(256) torchvision.transforms.CenterCrop(224) torchvision.transforms.ToTensor() torchvision.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))

  2. For loss you use label smoothing factor of 0.1 on CrossEntropyLoss

  3. For final fully connected layer you use dropout whose ratio = 0.2

  4. For SGD optimizer you use momentum=0.9, weight_decay=0.0001, nesterov=False, loss_scale=1024, I don't know what's the role of loss_scale, it it related to mixed precision training?

  5. For learning rate with SGD you use warmup_lr=0.0001, warmup_epochs=3, after this, you use base_lr=0.01, decay it every 30 epochs by a factor of 0.1, and batch size=128, total epochs=200

Am I correct? I'm trying to build my research work on top of yours, your help would be of great help to me! Many Thx!

iamhankai commented 3 years ago

This script is correct: https://gitee.com/mindspore/mindspore/blob/master/model_zoo/research/cv/tinynet/script/train_distributed_gpu.sh

In addition, loss_scale is related to mixed precision training in Mindspore.

seanzhuh commented 3 years ago

Thanks a lot mate! I realized that you've using the same hyper-parameters on different optimizers except the epoch differs when using distributed training. By the way, am I correct about data augmentation, label smoothing and dropout ratio?

iamhankai commented 3 years ago

Yes, your data augmentation, label smoothing and dropout ratio are correct.

seanzhuh commented 3 years ago

It's a huge help, thanks! I will close this issue

seanzhuh commented 3 years ago

Hi, I'm reopening this issue because I can't find training hyper-parameters for CIFAR-10, could you please generously provide a minimal working setting ?

iamhankai commented 3 years ago

lr=0.1, 400 epochs, 200,300,375 decay by 0.1. weight decay 5e-4.