vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.91k stars 566 forks source link

SAC cannot converge to optimal policy #410

Closed mahaozhe closed 10 months ago

mahaozhe commented 11 months ago

Problem Description

When I run experiments on the "MountainCarContinuous-v0" environment, I found that the sac_continuous_action.py can't converge to the optimal policy, compared with rpo_continuous_action.py, SAC will keep a local optimal without any increasing returns:

We can see that SAC will converge to around 0 while RPO can converge to around 100 (optimal policy) much faster.

Checklist

Current Behavior

In the "MountainCarContinuous-v0" environment, SAC algorithm can only converge to around 0 episodic returns. (The agent can't complete the task every time)

Expected Behavior

We expect the SAC can also converge to around 100 episodic returns.

Possible Solution

I tried some different hyper-parameters or running more episodes, however I can't get the expected results.

Steps to Reproduce

I hope you can give me some suggestions to finetune the hyper-parameters or update the algorithm. Thanks a lot!

dosssman commented 10 months ago

Hello. Sorry for late answer. I recall also having some difficulties getting SAC (sometimes other algorithms) to converge on a supposedly trivial task such as MountainCar.

Just from the top of my head, maybe an avenue worth exploring, encouraging more exploration with higher--alpha noise could help overcome the local optimum.

fr30 commented 10 months ago

Hey, had a similar issue with discrete SAC and PPO performance. I wanted to adapt it from training on Atari to solving Minigrid. I thought there must be some issue with algorithm or environment setup but apparently I just had to spend some more time on fine-tuning parameters. Fortunately you don't have to do it by hand, as there's libraries that can handle that for you (like Optuna). I gotta admit though that SAC requires much more time to properly converge.

You could also check out Tips and tricks. Maybe that will help you to spot the issue.

Hope that helps!

mahaozhe commented 10 months ago

Thanks a lot for your comments, @fr30!