Closed mahaozhe closed 10 months ago
Hello. Sorry for late answer. I recall also having some difficulties getting SAC (sometimes other algorithms) to converge on a supposedly trivial task such as MountainCar.
Just from the top of my head, maybe an avenue worth exploring, encouraging more exploration with higher--alpha
noise could help overcome the local optimum.
Hey, had a similar issue with discrete SAC and PPO performance. I wanted to adapt it from training on Atari to solving Minigrid. I thought there must be some issue with algorithm or environment setup but apparently I just had to spend some more time on fine-tuning parameters. Fortunately you don't have to do it by hand, as there's libraries that can handle that for you (like Optuna). I gotta admit though that SAC requires much more time to properly converge.
You could also check out Tips and tricks. Maybe that will help you to spot the issue.
Hope that helps!
Thanks a lot for your comments, @fr30!
Problem Description
When I run experiments on the
"MountainCarContinuous-v0"
environment, I found that the sac_continuous_action.py can't converge to the optimal policy, compared with rpo_continuous_action.py, SAC will keep a local optimal without any increasing returns:The learning records from RPO:
The learning records from SAC:
We can see that SAC will converge to around
0
while RPO can converge to around100
(optimal policy) much faster.Checklist
poetry install
(see CleanRL's installation guideline.Current Behavior
In the
"MountainCarContinuous-v0"
environment, SAC algorithm can only converge to around0
episodic returns. (The agent can't complete the task every time)Expected Behavior
We expect the SAC can also converge to around
100
episodic returns.Possible Solution
I tried some different hyper-parameters or running more episodes, however I can't get the expected results.
Steps to Reproduce
I hope you can give me some suggestions to finetune the hyper-parameters or update the algorithm. Thanks a lot!