Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)
if comment out jax.random.permutation(subkey, x) in HalfCheetah-v3 env we will get
Nan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500
Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)
if comment out
jax.random.permutation(subkey, x)
in HalfCheetah-v3 env we will getNan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500