Open zhujl1991 opened 3 years ago
You're right, and also reward shaping can somehow write in gym.Wrapper
or collector.preprocess_fn
(if treating F_D as a part of the environment).
Currently we don't have any plan. But you are welcome to make a PR to let them compatible with all policies.
You're right, and also reward shaping can somehow write in
gym.Wrapper
orcollector.preprocess_fn
(if treating F_D as a part of the environment). Currently we don't have any plan. But you are welcome to make a PR to let them compatible with all policies.
AFAIU, the shaping is only applied to DQN. What do you mean by compatible with *all* policies.
?
Because in section 3.3 Policy Shaping
the authors doesn't claim that this algorithm is only compatible with DQN. My understanding is that this policy shaping method can be adapted to any off-policy + Q-value function approximation algorithms.
But even if the policy shaping can only be adapted to DQN, I don't think it is hard to extend to DQN families -- C51, QRDQN, etc.
Do we have any plans to add reward shaping and policy shaping to DQN? Looks like the reward shaping requires overwriting here, and policy shaping requires overwriting here.