vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.91k stars 566 forks source link

What is the reason for returning mean in SAC get_action function if it's never used? #333

Open sudonymously opened 1 year ago

sudonymously commented 1 year ago

Problem Description

In the script sac_continuous_action.py, the get_action function in the Actor class returns action, log_prob, mean. action and log_prob is used but mean is never used. Is there a reason to return that value when it's never used in the code? As a new comer it's a little confusing on why that is needed.

Checklist

Current Behavior

Works as expected

Expected Behavior

Works as expected

Possible Solution

Remove the mean returned in the get_action function in Actor class

dosssman commented 1 year ago

Greetings. Sorry for the late answer.

In the original implementation, the mean is used for deterministic evaluation of the agent. Intuitively, using mean corresponds to the greediest policy, and would results in maximal performance.

While CleanRL directly uses the episodic return collected during training (i.e. stochastic action sampling), the mean is left for compatibility with original implementation.

Leaving mean in the code should also help facilitate researchers who build on top of it to directly access it if they need it for their experiments / evaluation.

Hope it helps.