Closed Howuhh closed 2 years ago
If this is really a problem I will make a fix PR.
Oh this makes sense! Thanks for raising the issue. I think ppo_procgen.py and ppg_procgen.py also use the reward normalization - feel free to submit a PR to fix them.
Fixed.
Problem Description
Current implementation of continuous action PPO uses
gym.wrappers.NormalizeReward
with default gamma value, for all other gamma's except default0.99
this normalization will be not correct. https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/cleanrl/ppo_continuous_action.py#L92Possible Solution
Very easy, just add
gamma=args.gamma
as an argument to the normalization wrapper.