Closed slee01 closed 5 years ago
Hello, I am trying to implement a GAIL with wasserstein distance. But I don't know how to do with reward from discriminator. I try much version, not works. I also see the code of InfoGAIL, but can't good understand of it. Can you explain more on "in the authors' repo was just scaled the discriminator output."?
In openai/baselines code ,the GAIL reward is "self.reward_op = -tf.log(1-tf.nn.sigmoid(generator_logits)+1e-8)" generator_logits is the output of discriminator network for generator data. I read some paper and decided to change it to "self.reward_op = generator_logits", but not works
path["rewards"] = np.ones(path["raws"].shape[0]) 1.2 + \ output_d.flatten() 0.2 + \ np.sum(np.log(output_p) * path["encodes"], axis=1)
Is this the code in InfoGAIL you mentioned? Thank you
Hi, thank you for your great repo!
I've been trying to implement InfoGAIL based on your repo and I'm wondering about your opinion on GAIL with Wasserstein distance. InfoGAIL is based on GAIL with Wasserstein distance and I checked that the reward function of a discriminator in the authors' repo was just scaled the discriminator output. However, as far as I know, it is possible that even the perfectly trained discriminator can have only positive or only negative values for agent and expert data because the loss function for WGAN considers only the gap between outputs for fake and true data. I'm not sure, but I guess this is the reason why you didn't implement the discriminator with Wasserstein distance in your repo...
Could you share your opinion on what I'm confused about? I would really appreciate any help.