uidilr / gail_ppo_tf

Tensorflow implementation of Generative Adversarial Imitation Learning(GAIL) with discrete action
MIT License
111 stars 29 forks source link

Your implementation is different from the Gail paper #8

Closed Guiliang closed 6 years ago

Guiliang commented 6 years ago

Nice implementation! but I find some inconsistency in your discriminator.py If you check the formula 17 of paper General Adversarial Imitation Learning it should be 1-expert_probability, but in line 35 in your discriminator.py, it is 1-policy_probability. Just want to assure with you :)

uidilr commented 6 years ago

Thanks for pointing it out! You are right that it is inconsistent with original code. In original paper, the discriminator outputs a "cost". In my implementation, the discriminator outputs a "reward" instead of "cost". I implemented in this way because I am more familiar with rewards than costs.

Guiliang commented 6 years ago

Thanks for your explanation! Guess it still work if I switch it to cost right?

uidilr commented 6 years ago

Yes, it should work if you find and rewrite all the codes using discriminator's outputs, such as https://github.com/uidilr/gail_ppo_tf/blob/f8f496f69c2a166e91164d9082d7f9fd5a80b6b5/run_gail.py#L97-L101

shamanez commented 6 years ago

I think nothing will change by changing the reward to cost. It can be change from maximisation problem to minimization problem :) Which is similar.