Closed jppgks closed 6 years ago
Hi Joppe,
the training of a single GAN is done on a single GPU (it's relatively fast for the architecture and datasets that we used).
We launched multiple experiments in parallel - first by running compare_gan_generate_tasks
to create a set of experiment to run, then by running compare_gan_run_one_task
on many machines (machine 0 with task_num=0, machine 1 with task_num=1, etc)
@jppgks comparing multiple experiments in parallel is nothing like distributed training, unless hyper-parameter optimization is the end goal. Is this what you mean by multiple tasks?
Note: We have updated the framework in the meantime and it now supports distributed training (single run on multiple machines) for TPUs.
@Marvin182 where can I find this in the code?
Thanks for open sourcing the code for this awesome paper!
I’m wondering if you used distributed training of the different GAN models during experimentation. If so, could you share an example of how to launch a distributed training job using
compare_gan
code?