The impact of random shuffle

Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)

if comment out jax.random.permutation(subkey, x) in HalfCheetah-v3 env we will get Nan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500

quangr / jax-rl

The impact of random shuffle #4