As far as I can understand it, it's creating a sequence of (tlen + 1) rtg values, then checking whether the sequence length is <= tlen, and padding it with an extra value if not. (I'm struggling to see how this situation will ever arise.)
A few lines later, the padding code is applied, pre-padding with 0s to make sure everything is length max_len, except for rtg, which will now be length max_len + 1.
I don't understand the purpose of this extra value, especially since it seems to get stripped anyway by the SequenceTrainer:
Hi, I'm trying to understand the following code in gym/experiment.py/get_batch():
( from https://github.com/kzl/decision-transformer/blob/master/gym/experiment.py#:~:text=rtg.append(discount_cumsum,1))%5D%2C%20axis%3D1) )
As far as I can understand it, it's creating a sequence of (tlen + 1) rtg values, then checking whether the sequence length is <= tlen, and padding it with an extra value if not. (I'm struggling to see how this situation will ever arise.) A few lines later, the padding code is applied, pre-padding with 0s to make sure everything is length
max_len
, except for rtg, which will now be lengthmax_len + 1
.I don't understand the purpose of this extra value, especially since it seems to get stripped anyway by the SequenceTrainer:
Am I missing something? Thanks!