Closed capilano closed 5 years ago
Thanks for your reply. 1) I used tensorflow which does not support reflect or symmetric paddings (TPU specific). The padding itself is supported but the gradient is not defined for TPU's 2) Learning rate starts at 2e-4 and decays down to 1e-6 towards the end. 3) Just out of curiosity,will adding extra channels help (4 or 5 input channels instead of 3) 4) I did not experiment with architectures, just used the resnet architecture with patch GAN discriminator and LSGAN loss. Mainly wanted to see how using a larger batch size impacts training in terms of speed,quality of generated images etc
Thanks, I did try using batch size of 32,I think it was possible to go up to 56 with minor changes in the architecture, after that there were memory issues. I think the results were decent, the main advantage being that the training gets completed in about one and a half hours.
I tried training the horses to zebra dataset using Colab's TPU with a batch_size of 32(4 per TPU core). The main differences were that I used zero padding(since reflect padding is not a supported TPU op) and uniform(either 3 or 5) kernel_sizes and transposed conv2d using stride 2 for upsampling.. I trained for about 300 epochs, each epoch took around 30 seconds, and I had reasonable results. Also, I did not use an image pool, just current mini batch images to train the discriminator. Some observations: