ofirnachum / sequence_gan

Generative adversarial networks (GAN) applied to sequential data via recurrent neural networks (RNN).
395 stars 124 forks source link

negative normalized_rewards #7

Closed npark closed 7 years ago

npark commented 7 years ago

Hi, "normalized_rewards" value becomes sometimes negative. I think it is because "self.expected_reward" is bigger than "rewards / _backwards_cumsum(decays, self.sequence_length)". Is this okay?

normalized_rewards = \ rewards / _backwards_cumsum(decays, self.sequence_length) - self.expected_reward

Thanks.

ofirnachum commented 7 years ago

Yes, the intent of normalized_rewards is to be "how much better/worse is this sequence than the average sequence".

npark commented 7 years ago

Could you please explain how you calculate the reward and loss? Thanks.

ofirnachum commented 7 years ago

The loss for the discriminator is a standard sigmoid cross-entropy loss on whether it can binary classify a sequence at each token as real/fake.

The reward for the generator at each token is the "real"-ness prediction (between 0 and 1) that the discriminator gave the sequence at that token. That is, we reward the generator for making the discriminator think that the sequence is real.