tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.14k stars 632 forks source link

The accuracy of the program running by horovod is low #517

Open lljjgg opened 3 years ago

lljjgg commented 3 years ago

When I run the program with "python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \ --model=resnet50 --optimizer=momentum --variable_update=replicated \ --nodistortions --gradient_repacking=8 --num_gpus=8 \ --num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \ --train_dir=${CKPT_DIR}". The final test accuracy is 75.96.% But I run the program with " horovodrun -np 8 python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \ --model=resnet50 --optimizer=momentum --variable_update=horovod \ --nodistortions --gradient_repacking=8 --num_gpus=8 \ --num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \ --train_dir=${CKPT_DIR}". The final test accuracy is 74%. Is this a normal result? or This is error that I run the program with horovod. Looking forward to your reply .Thank you