Question on using reward from discriminator instead for policy update

toshikwa / gail-airl-ppo.pytorch

PyTorch implementation of GAIL and AIRL based on PPO.

MIT License

164 stars 30 forks source link

Question on using reward from discriminator instead for policy update #10

Open guoyangqin opened 1 year ago

guoyangqin commented 1 year ago

Nice, finally found this project that updates the policy using reward from discriminator and aligns itself with the algorithm in the GAIL paper. In many other libraries, they just use the reward from the environment. I was wondering why they do that and if optimizing policy with that reward detached from discriminator can really maximize the objective function.

Charlesyyun commented 1 year ago

Exactly, that's why we need GAIL! Because we don't know about the real reward function.