quangr / jax-rl

jax version of ppo algorithm in mujoco enviroment, achieve SOTA(tianshou)
3 stars 0 forks source link

The impact of random shuffle #4

Open quangr opened 1 year ago

quangr commented 1 year ago

Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)

if comment out jax.random.permutation(subkey, x) in HalfCheetah-v3 env we will get Nan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500