nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Continuous action space should use Independent Normal instead of MultivariateNormal #60

Open imathg opened 1 year ago

imathg commented 1 year ago

Since in the code you use diagonal covariance matrix, MultivariateNormal deteriorates to Independent Normal. The former distribution calculates cholesky decomposition, and it is extremely slow when the cov mat is low dimensional (e.g., 4x4), even on gpu.

However, it is quite fast using Independent Normal.

smiles724 commented 1 year ago

Agree with that point.