Return-to-go conditioning on Atari

kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

MIT License

2.38k stars 450 forks source link

Closed geekyutao closed 3 years ago

geekyutao commented 3 years ago

Dear authors, Great work of DT! I found that the Return-to-go conditioning hyperparameters in Table 8 are different from https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/atari/mingpt/trainer_atari.py#L164 in the code. Which should be right?

Thanks

HonoMi commented 3 years ago

Thanks for the opening the issue. We also found the Return-to go for Qbert seems to be different from Table.8

eval_return = self.get_returns(14000)

Thanks!

lili-chen commented 3 years ago

Hi, those hyperparameters are from an earlier version of our paper, feel free to replace them with the numbers in Table 8.