Initial Actor loss issue when runninf continuous PPO

Describe the bug The initial actor loss is too large when learning in a continuous Mujoco environment.

To Reproduce python main.py --config config.ppo.mujoco --env.name half_cheetah --agent.n_step 2048 --train.num_workers 8

Expected behavior Very large or Nan ratio (actor loss) occurs.

Development Env. (OS, version, libraries): linux, V2XLARGE, jorldy:0.3.0

Additional context Add any other context about the problem here.

kakaoenterprise / JORLDY