Working my way through DT implementation I noticed that provided configs for locomotion tasks do not deliver perfomance as in wandb reports. Looking closely, the issue happened to be in reward_scale entry, namely the configs in repo had a value of reward_scale: 1.0, while the wandb reports (e.g. this one) show reward_scale: 0.001.
I also ran a small-scale experiment on hopper-medium-replay-v2, the perfomance of an updated config matched the one you report.
Hiya,
Working my way through DT implementation I noticed that provided configs for locomotion tasks do not deliver perfomance as in wandb reports. Looking closely, the issue happened to be in
reward_scale
entry, namely the configs in repo had a value ofreward_scale: 1.0
, while the wandb reports (e.g. this one) showreward_scale: 0.001
.I also ran a small-scale experiment on
hopper-medium-replay-v2
, the perfomance of an updated config matched the one you report.