Choosing initial hidden layer

HadiZayer commented 5 years ago

It looks like the initial hidden layer in both LSTM Generator and RelGAN_G are chosen so they are the same every time. Wouldn't that reduce the diversity of generated samples? What makes sure that the samples are diverse and different?

williamSYSU commented 5 years ago

What does "choosing intial hidden layer"? Do you mean initializing the parameters of the hidden layer in Oracle and Generator? Their parameter initialization is not exactly the same (Normal(0, 1) for Oracle, Normal(0, 0.1) for Generator). I think the reason Oracle and Generator both use Normal distribution initialization is that Generator is easier to learn the real data distribution. In fact, it's hard to tell what makes the generated samples are diverse and different. But obviously the Gumbel-Softmax trick would make the RelGAN suffer severe mode collapse.

HadiZayer commented 5 years ago

For example, in the sample function for RelGAN_G, it calls self.init_hidden which returns a matrix that represents the "memory" just like this:

tensor([[[1., 0., 0.,  ..., 0., 0., 0.]],

        [[1., 0., 0.,  ..., 0., 0., 0.]],

        [[1., 0., 0.,  ..., 0., 0., 0.]],

        ...,

        [[1., 0., 0.,  ..., 0., 0., 0.]],

        [[1., 0., 0.,  ..., 0., 0., 0.]],

        [[1., 0., 0.,  ..., 0., 0., 0.]]])

In fact, I was looking for such random initializations in the sample function, and I found out that the only random part is the output of add_gumbel since it adds a random vector before the softmax, but I didn't find the Normal distribution initialization you are mentioning.

williamSYSU / TextGAN-PyTorch

Choosing initial hidden layer #12