Closed HadiZayer closed 4 years ago
What does "choosing intial hidden layer"? Do you mean initializing the parameters of the hidden layer in Oracle and Generator? Their parameter initialization is not exactly the same (Normal(0, 1)
for Oracle, Normal(0, 0.1)
for Generator).
I think the reason Oracle and Generator both use Normal distribution initialization is that Generator is easier to learn the real data distribution.
In fact, it's hard to tell what makes the generated samples are diverse and different. But obviously the Gumbel-Softmax trick would make the RelGAN suffer severe mode collapse.
For example, in the sample
function for RelGAN_G
, it calls self.init_hidden
which returns a matrix that represents the "memory" just like this:
tensor([[[1., 0., 0., ..., 0., 0., 0.]],
[[1., 0., 0., ..., 0., 0., 0.]],
[[1., 0., 0., ..., 0., 0., 0.]],
...,
[[1., 0., 0., ..., 0., 0., 0.]],
[[1., 0., 0., ..., 0., 0., 0.]],
[[1., 0., 0., ..., 0., 0., 0.]]])
In fact, I was looking for such random initializations in the sample
function, and I found out that the only random part is the output of add_gumbel
since it adds a random vector before the softmax, but I didn't find the Normal distribution initialization you are mentioning.
It looks like the initial hidden layer in both LSTM Generator and RelGAN_G are chosen so they are the same every time. Wouldn't that reduce the diversity of generated samples? What makes sure that the samples are diverse and different?