roll out of the policy and collecting data, especially reward

Hi, thanks for sharing this. Quick question, here where you are collecting rewards https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L59 I guess, this is not right, your actions and reward are not related, you are relating old reward to new actions, I think you need to move appending reward after the step https://github.com/uidilr/gail_ppo_tf/blob/1dc3c3400d5b6329b49f3f18bdc89f3e475a023a/run_gail.py#L62, then you can append reward here. For run_gail.py it is okay doing this, because we are not using reward to update (rather we use discriminator reward), but for run_ppoI guess it is not correct, especially when we initially train the expert

uidilr / gail_ppo_tf