vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

Remove the value function clipping #208

Closed vwxyzjn closed 10 months ago

vwxyzjn commented 2 years ago

Problem description

Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.

https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/cleanrl/ppo.py#L283-L291

We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.

https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/benchmark/ppo.sh#L1-L59