[RLlib][Bug] duplicate action unsquashing in DDPG / TD3 policy

XuehaiPan commented 2 years ago

What happened + What you expected to happen

By default, normalize_actions is set to True in Trainer config for Box action space.

https://github.com/ray-project/ray/blob/c0ec20dc3a3f733fda85dcf9cc71f83d51132276/rllib/agents/trainer.py#L258-L262

The policy action output is a normalized version in the range of [-1, +1]. The normalized action will be unsquashed (unnormalized to action space) before sending it to the environment.

https://github.com/ray-project/ray/blob/c0ec20dc3a3f733fda85dcf9cc71f83d51132276/rllib/evaluation/sampler.py#L1232-L1233

However, for DDPG / TD3 Policy, the model output is in the bound of the env's action space rather than [-1, +1].

https://github.com/ray-project/ray/blob/c0ec20dc3a3f733fda85dcf9cc71f83d51132276/rllib/agents/ddpg/ddpg_torch_model.py#L103-L129

The policy output is firstly unsquashed by module _Lambda. Then it is unsquashed again by unsquash_action:

https://github.com/ray-project/ray/blob/c0ec20dc3a3f733fda85dcf9cc71f83d51132276/rllib/evaluation/sampler.py#L1232-L1233

cc @sven1977

Versions / Dependencies

ray[rllib] = 1.12.0

Reproduction script

rllib train --run TD3 --framework torch --env Pendulum-v1

Issue Severity

Low: It annoys or frustrates me.

XuehaiPan commented 2 years ago

I found another bug for action squashing in DDPG.

https://github.com/ray-project/ray/blob/27e7c284ee628787bfdb86b066d24d91db90eb87/rllib/utils/exploration/random.py#L147-L167

The exploration outputs random action in the env's action bounds (line 161). It will be unsquashed again.

JiahaoYao commented 2 years ago

Hi @XuehaiPan, This is a nice catch and does that mean if you change "normalize_actions": False, , the issue will be resolved?

XuehaiPan commented 2 years ago

Hi @XuehaiPan, does that mean if you change "normalize_actions": False, , the issue will be resolved?

If we set "normalize_actions": False for DDPG and TD3 policies, yes.

ray-project / ray