tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
https://arxiv.org/abs/2210.07105
Apache License 2.0
1.08k stars 131 forks source link

Question: about the layernorm on token input #40

Closed typoverflow closed 1 year ago

typoverflow commented 1 year ago

Hi, about DT I have one tiny question: https://github.com/tinkoff-ai/CORL/blob/2a7b88cfa8e25fbda77e2d3e55e4ee4267eeb431/algorithms/dt.py#L343-L346 However, it seems the original implementation did not use layernorm (ref: for atari https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/atari/mingpt/model_atari.py#L260 and for mujoco https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/decision_transformer/models/trajectory_gpt2.py#L687). Am I missing anything ?🤔

Howuhh commented 1 year ago

Hi @typoverflow! Yeah, it is done only in the gym implementation (which is the main reference for our implementation). You can find layer norm here: https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/decision_transformer/models/decision_transformer.py#L78