Regarding logic for first done indexes

Hi, Thanks for the code and the paper on using adaptive attention span in RL. In train.py, I haven't understood the logic for calculating ind_first_done in following line:
https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 .

After going through the loss calculations and learn function where ind_first_done is used, I feel line: https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 should be as follows: ind_first_done = padding_mask.long().argmax(0) + 1 . I feel so because from the comments, ind_first_done denotes the final index in each trajectory.

Could you kindly explain the logic used for the mentioned snippet?

jerrodparker20 / adaptive-transformers-in-rl