tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.15k stars 632 forks source link

only dump timeline and partition_graphs when is_chief == True #472

Closed zhuzilin closed 4 years ago

zhuzilin commented 4 years ago

This is for avoiding several process writing to the same timeline.json when using horovod.

reedwm commented 4 years ago

Dumping the timeline for each worker can be useful if you want to collect performance data for each worker. Is it possible to pass different command line flags to each horovod process?

zhuzilin commented 4 years ago

Dumping the timeline for each worker can be useful if you want to collect performance data for each worker. Is it possible to pass different command line flags to each horovod process?

@reedwm I'm afraid we can't. Because horovod is using mpirun to start, we need to pass the same parameter for all process.

reedwm commented 4 years ago

In that case, let's only do this if horovod is used. That is, we dump the trace if is_chief or self.params.variable_update != 'horovod'. Instead of passing both variable_update and is_chief, instead add a single should_output_files parameter.

zhuzilin commented 4 years ago

Instead of passing both variable_update and is_chief, instead add a single should_output_files parameter.

I'd love to. :ok_hand:

zhuzilin commented 4 years ago

@reedwm I've just added the should_output_files. Could you have a second look?