Closed mehdimashayekhi closed 5 years ago
Hi, thanks for sharing this. Quick question, here where you are collecting rewards https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L59 I guess, this is not right, your actions and reward are not related, you are relating old reward to new actions, I think you need to move appending reward after the step https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L62, then you can append reward here. For run_gail.py it is okay doing this, because we are not using reward to update (rather we use discriminator reward), but for run_ppoI guess it is not correct, especially when we initially train the expert
step
run_gail.py
run_ppo
Hello @mehdimashayekhi, sorry for the late reply and thank you for pointing it out!
I just solved this issue and confirmed that it learns appropriately.
Hi, thanks for sharing this. Quick question, here where you are collecting rewards https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L59 I guess, this is not right, your actions and reward are not related, you are relating old reward to new actions, I think you need to move appending reward after the
step
https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L62, then you can append reward here. Forrun_gail.py
it is okay doing this, because we are not using reward to update (rather we use discriminator reward), but forrun_ppo
I guess it is not correct, especially when we initially train the expert