Closed kengz closed 5 years ago
Hi, just wanted to point out, the follow-up paper by the authors of SAC https://arxiv.org/abs/1812.05905 One of the main differences with the original paper is that they don't use a separate V network
Hi, just wanted to point out, the follow-up paper by the authors of SAC https://arxiv.org/abs/1812.05905 One of the main differences with the original paper is that they don't use a separate V network
@CarloLucibello Thanks for pointing that out. This PR implements the first version of SAC and adds a discrete control version. I'll implement the improved version in the next PR.
Feature / Fix
softplus
to log-clamp-exp transform. verified working on PPO BipedalWalkerRoboschool (continuous control) Benchmark
graph
graph
graph
graph
LunarLander (discrete control) Benchmark