nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

RuntimeError for Environments with single continuous action #40

Closed Aakarshan-chauhan closed 3 years ago

Aakarshan-chauhan commented 3 years ago

For environments like Pendulum-v0 or MountanCarContinuous-v0, the following RuntimError is raised:


  File "D:\My C and Python Projects\Repos\PPO-PyTorch\train.py", line 301, in <module>
    train()
  File "D:\My C and Python Projects\Repos\PPO-PyTorch\train.py", line 229, in train
    ppo_agent.update()
  File "D:\My C and Python Projects\Repos\PPO-PyTorch\PPO.py", line 244, in update
    logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions)
  File "D:\My C and Python Projects\Repos\PPO-PyTorch\PPO.py", line 129, in evaluate
    action_logprobs = dist.log_prob(action)
  File "D:\users\Aakarshan\miniconda3\envs\torch\lib\site-packages\torch\distributions\multivariate_normal.py", line 208, in log_prob
    M = _batch_mahalanobis(self._unbroadcasted_scale_tril, diff)
  File "D:\users\Aakarshan\miniconda3\envs\torch\lib\site-packages\torch\distributions\multivariate_normal.py", line 54, in _batch_mahalanobis
    flat_L = bL.reshape(-1, n, n)  # shape = b x n x n
RuntimeError: shape '[-1, 800, 800]' is invalid for input of size 800