Closed seanzhuh closed 3 years ago
Please refer to the training settings in
Hi, thanks for your quick reply! To my best knowledge, I know nothing about your proposed TinyNet, here's the best parameters I can get from reading your source code for GhostNet:
Note: the following are based on PyTorch API
For data augmentation you use: training: torchvision.transforms.RandomResizedCrop((224, 224), scale=(0.08, 1.), ratio=(3./4., 4./3.), interpolation=Image.BICUBIC) torchvision.transforms.RandomHorizontalFlip() torchvision.transforms.ColorJitter(brightness=(0.6, 1.4), contrast=(0.6, 1.4), saturation=(0.6, 1.4)) torchvision.transforms.ToTensor() torchvision.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)) validation: torchvision.transforms.Resize(256) torchvision.transforms.CenterCrop(224) torchvision.transforms.ToTensor() torchvision.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
For loss you use label smoothing factor of 0.1 on CrossEntropyLoss
For final fully connected layer you use dropout whose ratio = 0.2
For SGD optimizer you use momentum=0.9, weight_decay=0.0001, nesterov=False, loss_scale=1024, I don't know what's the role of loss_scale, it it related to mixed precision training?
For learning rate with SGD you use warmup_lr=0.0001, warmup_epochs=3, after this, you use base_lr=0.01, decay it every 30 epochs by a factor of 0.1, and batch size=128, total epochs=200
Am I correct? I'm trying to build my research work on top of yours, your help would be of great help to me! Many Thx!
This script is correct:
In addition, loss_scale is related to mixed precision training in Mindspore.
Thanks a lot mate! I realized that you've using the same hyper-parameters on different optimizers except the epoch differs when using distributed training. By the way, am I correct about data augmentation, label smoothing and dropout ratio?
Yes, your data augmentation, label smoothing and dropout ratio are correct.
It's a huge help, thanks! I will close this issue
Hi, I'm reopening this issue because I can't find training hyper-parameters for CIFAR-10, could you please generously provide a minimal working setting ?
lr=0.1, 400 epochs, 200,300,375 decay by 0.1. weight decay 5e-4.
Hi, thanks for sharing such a wonderful work, I'd like to reproduce your results on ImageNet, could you please specify training parameters such as initial learning rate, how to decay it, batch size, etc. It would be even better if you can provide tricks to train GhostNet, such as label smoothing and data augmentation. Thx!