Closed mathfinder closed 7 years ago
I have another question about resnet:
def ResnetGenerator(n_samples, noise=None, dim=DIM):
...
output = tf.tanh(output / 5.)
Why /5.?
Re. scaling the residual by 0.3: it's a stability trick from https://arxiv.org/abs/1602.07261 (section 3.3) which I use by default. Should work fine without it though.
Re. dividing by 5, it's to (approximately) scale the variance of the outputs so the network doesn't start off in the saturated regime of the tanh. Again, you can probably do without, but this was the first thing I tried.
(Fixed the _dev_disc_cost
bug, thanks!)
hi,thx for your code. I have a question about the structure of resnet.I find that residual block's output is shortcut + (0.3*output) instead of shortcut + output.Is there any theoretical basis for it?Or it is a Experimental conclusion.It is not the same as the original resnet.
And the code is easy to read,but There is a place I do not understand : gen_64x64.py line 530 _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: _data}).Is it should be _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: images}). thx