igul222 / improved_wgan_training

Code for reproducing experiments in "Improved Training of Wasserstein GANs"
MIT License
2.35k stars 669 forks source link

A question about the structure of resnet #27

Closed mathfinder closed 7 years ago

mathfinder commented 7 years ago

hi,thx for your code. I have a question about the structure of resnet.I find that residual block's output is shortcut + (0.3*output) instead of shortcut + output.Is there any theoretical basis for it?Or it is a Experimental conclusion.It is not the same as the original resnet.

And the code is easy to read,but There is a place I do not understand : gen_64x64.py line 530 _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: _data}).Is it should be _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: images}). thx

LynnHo commented 7 years ago

I have another question about resnet:

def ResnetGenerator(n_samples, noise=None, dim=DIM):
... output = tf.tanh(output / 5.)

Why /5.?

igul222 commented 7 years ago

Re. scaling the residual by 0.3: it's a stability trick from https://arxiv.org/abs/1602.07261 (section 3.3) which I use by default. Should work fine without it though.

Re. dividing by 5, it's to (approximately) scale the variance of the outputs so the network doesn't start off in the saturated regime of the tanh. Again, you can probably do without, but this was the first thing I tried.

igul222 commented 7 years ago

(Fixed the _dev_disc_cost bug, thanks!)