vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Action bias is added twice in DDPG algorithm implementation, similar to #259 #297

Closed sdpkjc closed 1 year ago

sdpkjc commented 1 year ago

Problem Description

A problem was reported in #259 , and the problem appears to exist in ddpg(ddpg_continuous_action.py & ddpg_continuous_action_jax.py) as well.

the action bias is added twice to the action. First, it is added during actor forward pass. Second, the random noise comes from a distribution with the center in actor.action_bias.

# In ddpg_continuous_action.py line176
# https://github.com/vwxyzjn/cleanrl/blob/17febbf88c28d3572c62ae6fa041921ef84290c4/cleanrl/ddpg_continuous_action.py#L176
actions += torch.normal(actor.action_bias, actor.action_scale * args.exploration_noise)

# In ddpg_continuous_action_jax.py line232
# (https://github.com/vwxyzjn/cleanrl/blob/17febbf88c28d3572c62ae6fa041921ef84290c4/cleanrl/ddpg_continuous_action_jax.py#L232)
jax.device_get(actions)[0] + np.random.normal(action_bias, action_scale * args.exploration_noise)[0]

Checklist

Possible Solution