rootkitchao / mnasnet_tensorflow

MNASNET Tensorflow
Apache License 2.0
23 stars 9 forks source link

About the train config #1

Open holyhao opened 5 years ago

holyhao commented 5 years ago

Tanks for your work, it is excellent. You achieve 74.09% at top1, which is is slightly better than the reported as 74.08. How can you get such good results, i want to try it with different input size and depth multiplier on imagenet, as well as the object detection with ssd and segmentation with deeplab. So, could you please share the train config, let us make it clear with mnasnet by different task setting.

rootkitchao commented 5 years ago

I just used the hyperparameters given in the paper. First I will linearly increase the learning rate from 0 to 0.256*2 in 5 epochs.

python train_image_classifier.py --train_dir=D:\tf_project\imagenet\model\train --dataset_name=imagenet --dataset_split_name=train --dataset_dir=D:\dataset\imagenet1k --model_name=mnasnet_a1 --num_clones=2 --train_image_size=224 --label_smoothing=0.1 --moving_average_decay=0.9999 --weight_decay=0.00001 --batch_size=96 --learning_rate_decay_type=polynomial --learning_rate=0 --end_learning_rate=0.512 --learning_rate_decay_factor=1 --num_epochs_per_decay=0.5 --max_number_of_steps=6673 --preprocessing_name=inception_v2

Then train enough steps according to the hyperparameters in the paper. python train_image_classifier.py --train_dir=D:\tf_project\imagenet\model\train --dataset_name=imagenet --dataset_split_name=train --dataset_dir=D:\dataset\imagenet1k --model_name=mnasnet_a1 --num_clones=2 --train_image_size=224 --label_smoothing=0.1 --moving_average_decay=0.9999 --weight_decay=0.00001 --batch_size=96 --learning_rate_decay_type=exponential --learning_rate=0.512 --learning_rate_decay_factor=0.97 --num_epochs_per_decay=1.2 --max_number_of_steps=4500000 --preprocessing_name=inception_v2 During training, I may forget to set weight_decay=0.00001, but use the default value.There are only two graphics cards on my workstation. The values of learning_rate, num_epochs_per_decay, max_number_of_steps need to be adjusted according to the number of graphics cards.

In addition, I tried to use the hyperparameter of Mobilenet V2, but the accuracy can only reach about 73.5%.It seems that Google's implementation has achieved better accuracy (74.4%), and perhaps increasing the number of training steps can be achieved.

holyhao commented 5 years ago

@rootkitchao Tanks for your work. By the way, the warm up train config do not need to set anything like train_num_steps? And the second trainning steps is 4.5M, it is too long and will take for nearly 2 months.

Hi,Correct me if i am wrong. The warm up part the num_epochs_per_decay = 1/gpus, will make the learning rate start from 0 to 0.256Xgpus in just one epoch. And will not stop training at lr (0.256Xgpus).

decay_steps = int(num_samples_per_epoch * FLAGS.num_epochs_per_decay /
                    FLAGS.batch_size)
polynomial_decay:
if cycle=False , power=1 then:
global_step=min(global_step, decay_steps)
decayed_learning_rate=(learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate