haarnoja / softqlearning

Reinforcement Learning with Deep Energy-Based Policies
https://arxiv.org/abs/1702.08165
416 stars 94 forks source link

Major code refactoring #6

Closed haarnoja closed 6 years ago

haarnoja commented 6 years ago

This pull request replaces large parts of the implementation with soft actor-critic code for better compatibility and easier maintenance. It also changes the way how the action bounds are enforced by replacing InputBounds with a squashing function (tanh).