kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 440 forks source link

The results on Mujoco reported in paper might be heavily influenced by env version #42

Open linprophet opened 2 years ago

linprophet commented 2 years ago

Hello there,

Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:

  1. Have you considered the environment version into consideration in the results?
  2. Refer to https://github.com/kzl/decision-transformer/issues/16 . The score is normalized by an export policy from https://github.com/rail-berkeley/d4rl/blob/master/d4rl/infos.py . However, the results based on the official code are far away from the results reported in the paper. Or did I miss some key components in DT code?

Looking forward to your reply!

Best Wishes