TD3 policy noise bugs - Githubissues

tomjur commented 1 year ago

Problem Description

There are two bugs in https://github.com/vwxyzjn/cleanrl/blob/e466f6efb251462de2b80f064133f32cb5d83e22/cleanrl/td3_continuous_action.py#L209 (1) The same noise is used for all the batch actions. (2) Action scale is not taken into account for the noise.

Checklist

[x] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo (required)

Current Behavior

(1) Takes noise size from actions[0] (2) No scaling is performed on the noise, but the policy could have a different scale (see https://github.com/vwxyzjn/cleanrl/blob/e466f6efb251462de2b80f064133f32cb5d83e22/cleanrl/td3_continuous_action.py#L114 )

Expected Behavior

(1) Should take shape of data.actions (2) Scale the noise according to the policy scale

Possible Solution

(1) replace torch.Tensor(actions[0]) with torch.Tensor(data.actions) (2) Multiply the noise with target_actor.action_scale

Steps to Reproduce

Run the script https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py, stop on line 209 to view shapes.

dosssman commented 1 year ago

Thanks a lot for the heads up. I have added the fix in PR #281 mostly as suggested. not sure if it will have much impact on the results, as the noise application process is ... well, noisy.