Gradients are not averaged in AllReduceSpec=nccl (variable_update=replicated) mode

tensorflow / benchmarks

A benchmark framework for Tensorflow

Apache License 2.0

1.15k stars 632 forks source link

Correct, the gradients are summed instead of averaged. This is true with variable_update=replicated regardless of what the AllReduceSpec is.

Arguably, summing instead of averaging is not a bug. The tf.distribute API also sums gradients instead of averaging them, and expects you to divide the per-replica loss by the number of replicas to compensate. We don't average since it has a performance cost over summing. However, variable_update=parameter_server averages gradients instead IIRC, and this inconsistency is a bug.

Unfortunately, this will not get fixed since tf_cnn_benchmarks is unmaintained.

tensorflow / benchmarks

Gradients are not averaged in AllReduceSpec=nccl (variable_update=replicated) mode #466