Closed vwxyzjn closed 10 months ago
Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.
https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/cleanrl/ppo.py#L283-L291
We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.
https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/benchmark/ppo.sh#L1-L59
Problem description
Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.
https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/cleanrl/ppo.py#L283-L291
We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.
https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/benchmark/ppo.sh#L1-L59