tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
https://arxiv.org/abs/2210.07105
Apache License 2.0
1.08k stars 131 forks source link

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

Closed suessmann closed 1 year ago

suessmann commented 1 year ago

Hiya,

Working my way through DT implementation I noticed that provided configs for locomotion tasks do not deliver perfomance as in wandb reports. Looking closely, the issue happened to be in reward_scale entry, namely the configs in repo had a value of reward_scale: 1.0, while the wandb reports (e.g. this one) show reward_scale: 0.001.

I also ran a small-scale experiment on hopper-medium-replay-v2, the perfomance of an updated config matched the one you report.