thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.61k stars 1.1k forks source link

Add reward shaping and policy shaping to DQN #279

Open zhujl1991 opened 3 years ago

zhujl1991 commented 3 years ago

Do we have any plans to add reward shaping and policy shaping to DQN? Looks like the reward shaping requires overwriting here, and policy shaping requires overwriting here.

Trinkle23897 commented 3 years ago

You're right, and also reward shaping can somehow write in gym.Wrapper or collector.preprocess_fn (if treating F_D as a part of the environment). Currently we don't have any plan. But you are welcome to make a PR to let them compatible with all policies.

zhujl1991 commented 3 years ago

You're right, and also reward shaping can somehow write in gym.Wrapper or collector.preprocess_fn (if treating F_D as a part of the environment). Currently we don't have any plan. But you are welcome to make a PR to let them compatible with all policies.

AFAIU, the shaping is only applied to DQN. What do you mean by compatible with *all* policies.?

Trinkle23897 commented 3 years ago

Because in section 3.3 Policy Shaping the authors doesn't claim that this algorithm is only compatible with DQN. My understanding is that this policy shaping method can be adapted to any off-policy + Q-value function approximation algorithms.

But even if the policy shaping can only be adapted to DQN, I don't think it is hard to extend to DQN families -- C51, QRDQN, etc.