pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic
MIT License
822 stars 182 forks source link

Normalized Actions has bugs #12

Closed Phlogiston90 closed 5 years ago

Phlogiston90 commented 5 years ago

One should be careful in uncommenting the normalized actions wrapper, as one has to make sure to call _reverse_action() and _max_episode_steps has a typo and should not be a function, otherwise the following in main.py would not work: mask = 1 if episode_steps == env._max_episode_steps else float(not done)

This small bug caused a lot of headaches but the repo is super nice otherwise!

pranz24 commented 5 years ago

True

The easiest way to use normalized actions would be to directly scale the actions by a factor of env.action_space.high[0] Like it is done in these 2 repo's https://github.com/sfujim/TD3 https://github.com/openai/spinningup/tree/master/spinup/algos/sac

And yes _max_episode_steps is not part of gym.ActionWrapper (I don't understand why I have used it there) You can check how _max_episode_steps works: https://github.com/openai/gym/blob/85a5372a19c0f35db2410e586cc9a32c4d94bf1a/gym/wrappers/time_limit.py https://github.com/openai/gym/blob/239aaf14ce804c9ce5068bfb69590110ea8ef1be/gym/envs/registration.py

pranz24 commented 5 years ago

@Phlogiston90 https://github.com/pranz24/pytorch-soft-actor-critic/pull/13

Phlogiston90 commented 5 years ago

Thanks a lot! :-)