Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:
Have you considered the environment version into consideration in the results?
Hello there,
Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:
Looking forward to your reply!
Best Wishes