tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.15k stars 633 forks source link

model VGG-11、16、19 accuracy do not increase #238

Open wx1111 opened 6 years ago

wx1111 commented 6 years ago

hi, I run benchmark on my cluster with model VGG-11、16、19 in distributed mode(1 ps and 4 worker). The accuracy do not increase. the optimizer settings are: optimizer : rmsprop init_learning_rate : 0.01 num_epochs_per_decay = 5 learning_rate_decay_factor=0.95 momentum=0.9 the batch_size is default value. And the accuracy are following: vgg11: Step Img/sec total_loss top_1_accuracy top_5_accuracy 9500 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.897 0.000 0.008 9600 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.909 0.000 0.004 9700 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.909 0.000 0.004 9800 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.910 0.000 0.004 9900 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.917 0.004 0.004 10000 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.903 0.000 0.000 10100 images/sec: 31.4 +/- 0.0 (jitter = 0.7) 6.910 0.000 0.008

The vgg16 vgg19 are almost the same accuracy.

any ideas? thanks a lot!

reedwm commented 6 years ago

Unfortunately, no one is working on either distributed convergence/performance or the VGG. So we currently have no one to look into this. At the momentum, our focus is on single-machine resnet50.