Open HareshKarnan opened 4 years ago
I am also confused, what should I do if I just wanna GAIL loss? just reward = - (1 - s).log()
if we look at the algorithm section for GAIL, the proposed loss function is log(D(.)) so just use that. For stability reasons, add 1e-8 inside the log term, like : log(D(.) + 1e-8) to ensure you dont get a huge negative reward when output of the discriminator is zero.
You can also try -log(1-D(.) + 1e-8) [the alternative GAN loss]
I noticed that the predict reward function uses log(D(.)) - log(1-D(.)) as the reward to update the generator. However, this is the reward function proposed in the AIRL paper which minimizes the reverse KL divergence instead of JS divergence as in GAIL. is it common for implementations to swap out the GAIL loss with AIRL loss ?
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/84a7582477fb0d5c82ad6d850fe476829dddd2e1/a2c_ppo_acktr/algo/gail.py#L103