What should return to go be during inferencing?

kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

MIT License

2.33k stars 440 forks source link

What should return to go be during inferencing? #70

Closed FinAminToastCrunch closed 4 months ago

FinAminToastCrunch commented 7 months ago

I understand, during training, at each time step, the transformer is fed the "return to go"

During inference, how would we compute the return to go which would need to be computed before each action?

Do we do "desired reward"/episode_length?

FinAminToastCrunch commented 7 months ago

From my understanding, after setting an initial target reward for the entire episode, for each reward we get after performing a step, we do

new_target_reward = target_reward - received_reward_from_env

Is this correct?

kzl commented 4 months ago

In general there isn't a single way to pick RTG and it depends on the environment. You would probably sweep this value for experiments.
After every step you subtract the received reward, correct.

Multi-Game Decision Transformers (https://arxiv.org/abs/2205.15241) looks into learning the RTG like a value function, removing the need to specify it manually.