sail-sg / envpool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
https://envpool.readthedocs.io
Apache License 2.0
1.08k stars 99 forks source link

[BUG] Transition pair misalignment in the `ppo_atari` example #174

Closed fuyw closed 2 years ago

fuyw commented 2 years ago

Describe the bug

In the ppo_atari example, we sample the action:

https://github.com/sail-sg/envpool/blob/ea86c2b77d12aaa58725bfeb1d701e3207f11822/examples/ppo_atari/ppo.py#L236

after we receive transitions from the train_envs:

https://github.com/sail-sg/envpool/blob/ea86c2b77d12aaa58725bfeb1d701e3207f11822/examples/ppo_atari/ppo.py#L231

Then the tuple (obs, act, rew, done, log_prob, value) is added to the batch.

However, it seems that obs = o_{t+1}, act = a_{t+1}, rew = r_t, done = d_t corepond to two different timestamps.

Checklist

Trinkle23897 commented 2 years ago

correspond to two different timestamps.

In gae.py it corrects the order:

https://github.com/sail-sg/envpool/blob/ea86c2b77d12aaa58725bfeb1d701e3207f11822/examples/ppo_atari/gae.py#L41-L48

fuyw commented 2 years ago

Many thanks for the explanation.