Multi GPU training of the discriminator

Hi,

I was wondering if anyone has tried customizing the code to multi GPU training instead of TPUs. The current code works for a single GPU without a lot of modifications (set use_tpu = False). However, I am facing some trouble in running it with multi GPU.

I changed the configuration as follows (tensorflow-gpu 1.13.1):

distribution = tf.contrib.distribute.MirroredStrategy(num_gpus=FLAGS.num_gpus)
run_config = tf.estimator.RunConfig( log_step_count_steps = 10, save_summary_steps = 10, model_dir=FLAGS.output_dir, save_checkpoints_steps=FLAGS.iterations_per_loop, keep_checkpoint_max=5, train_distribute = distribution )

estimator = tf.estimator.Estimator( model_fn=model_fn, config=run_config, model_dir = FLAGS.output_dir, params = {'batch_size': FLAGS.batch_size} )

estimator.train(input_fn=train_input_fn, steps=num_train_steps)

However, I have the following error:

raise ValueError("You must specify an aggregation method to update a " ValueError: You must specify an aggregation method to update a MirroredVariable in Replica Context.

Has anyone maybe found a solution to this? Thanks.

rowanz / grover

Multi GPU training of the discriminator #29