nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Fix for RuntimeError for Environments with single continuous actions. #41

Closed Aakarshan-chauhan closed 3 years ago

Aakarshan-chauhan commented 3 years ago

For environments with action shape ( 1 , ) , the action values are in a 1D array instead of being in batches. Minor changes for fixing it

nikhilbarhate99 commented 3 years ago

Hey, thanks for the pull request. But, to make it more explicit and not affect performance on other environments, change it to :

if(self.action_dim == 1):
    action = action.reshape(-1, 1)