uvipen / Super-mario-bros-PPO-pytorch

Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
MIT License
1.07k stars 201 forks source link

a question regarding action sampling #23

Closed Christian-lyc closed 7 months ago

Christian-lyc commented 7 months ago

Hi, Thank you so much for your work, it inspires me a lot. One thing I'm not clear is whether I should use action sampling in the rollout phase. From the PPO blog's code (repo: CleanRL, The 37 Implementation Details of Proximal Policy Optimization), seems it should have. I hope you can help me explain. Thank you

Best regards, Yichao