vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

TD3 policy noise bugs #279

Closed tomjur closed 1 year ago

tomjur commented 1 year ago

Problem Description

There are two bugs in https://github.com/vwxyzjn/cleanrl/blob/e466f6efb251462de2b80f064133f32cb5d83e22/cleanrl/td3_continuous_action.py#L209 (1) The same noise is used for all the batch actions. (2) Action scale is not taken into account for the noise.

Checklist

Current Behavior

(1) Takes noise size from actions[0] (2) No scaling is performed on the noise, but the policy could have a different scale (see https://github.com/vwxyzjn/cleanrl/blob/e466f6efb251462de2b80f064133f32cb5d83e22/cleanrl/td3_continuous_action.py#L114 )

Expected Behavior

(1) Should take shape of data.actions (2) Scale the noise according to the policy scale

Possible Solution

(1) replace torch.Tensor(actions[0]) with torch.Tensor(data.actions) (2) Multiply the noise with target_actor.action_scale

Steps to Reproduce

  1. Run the script https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py, stop on line 209 to view shapes.
dosssman commented 1 year ago

Thanks a lot for the heads up. I have added the fix in PR #281 mostly as suggested. not sure if it will have much impact on the results, as the noise application process is ... well, noisy.

tomjur commented 1 year ago

Right, probably so (: Thanks for the quick bug-fix!