Closed Phlogiston90 closed 5 years ago
True
The easiest way to use normalized actions would be to directly scale the actions by a factor of env.action_space.high[0] Like it is done in these 2 repo's https://github.com/sfujim/TD3 https://github.com/openai/spinningup/tree/master/spinup/algos/sac
And yes _max_episode_steps is not part of gym.ActionWrapper (I don't understand why I have used it there) You can check how _max_episode_steps works: https://github.com/openai/gym/blob/85a5372a19c0f35db2410e586cc9a32c4d94bf1a/gym/wrappers/time_limit.py https://github.com/openai/gym/blob/239aaf14ce804c9ce5068bfb69590110ea8ef1be/gym/envs/registration.py
Thanks a lot! :-)
One should be careful in uncommenting the normalized actions wrapper, as one has to make sure to call _reverse_action() and _max_episode_steps has a typo and should not be a function, otherwise the following in main.py would not work: mask = 1 if episode_steps == env._max_episode_steps else float(not done)
This small bug caused a lot of headaches but the repo is super nice otherwise!