nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.66k stars 343 forks source link

Fix Squeeze Under 1d Action Case #22

Closed xunzhang closed 4 years ago

xunzhang commented 4 years ago

When action is 1 dimension(e.g. for debugging), the current code is incorrect since it squeezes additional action dimension. This PR fixed this issue.

nikhilbarhate99 commented 4 years ago

Did you test it with Bipedal Walker env after the commit?

xunzhang commented 4 years ago

Yes, please check out log around 4000 episodes below.

Episode 3620     Avg length: 1084    Avg reward: 199
Episode 3640     Avg length: 972     Avg reward: 160
Episode 3660     Avg length: 745     Avg reward: 87
Episode 3680     Avg length: 994     Avg reward: 169
Episode 3700     Avg length: 1091    Avg reward: 194
Episode 3720     Avg length: 978     Avg reward: 163
Episode 3740     Avg length: 1150    Avg reward: 219
Episode 3760     Avg length: 1062    Avg reward: 174
Episode 3780     Avg length: 1014    Avg reward: 156
Episode 3800     Avg length: 1062    Avg reward: 172
Episode 3820     Avg length: 1182    Avg reward: 205
Episode 3840     Avg length: 1150    Avg reward: 195
Episode 3860     Avg length: 929     Avg reward: 140
Episode 3880     Avg length: 1118    Avg reward: 205
Episode 3900     Avg length: 1020    Avg reward: 160
Episode 3920     Avg length: 1030    Avg reward: 170
Episode 3940     Avg length: 1027    Avg reward: 163
Episode 3960     Avg length: 1011    Avg reward: 155
Episode 3980     Avg length: 1003    Avg reward: 151
Episode 4000     Avg length: 994     Avg reward: 155
Episode 4020     Avg length: 1038    Avg reward: 151
Episode 4040     Avg length: 1069    Avg reward: 170
Episode 4060     Avg length: 946     Avg reward: 134
Episode 4080     Avg length: 1078    Avg reward: 179
Episode 4100     Avg length: 1119    Avg reward: 198
Episode 4120     Avg length: 967     Avg reward: 153
Episode 4140     Avg length: 1037    Avg reward: 160
Episode 4160     Avg length: 997     Avg reward: 159
Episode 4180     Avg length: 1207    Avg reward: 212
Episode 4200     Avg length: 983     Avg reward: 145
Episode 4220     Avg length: 1181    Avg reward: 205
Episode 4240     Avg length: 1169    Avg reward: 211
Episode 4260     Avg length: 1205    Avg reward: 212
Episode 4280     Avg length: 1232    Avg reward: 227
Episode 4300     Avg length: 1186    Avg reward: 199
Episode 4320     Avg length: 1256    Avg reward: 227