jerrodparker20 / adaptive-transformers-in-rl

Adaptive Attention Span for Reinforcement Learning
132 stars 14 forks source link

Regarding logic for first done indexes #17

Open victor-psiori opened 3 years ago

victor-psiori commented 3 years ago

Hi, Thanks for the code and the paper on using adaptive attention span in RL. In train.py, I haven't understood the logic for calculating ind_first_done in following line:
https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 .

After going through the loss calculations and learn function where ind_first_done is used, I feel line: https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 should be as follows: ind_first_done = padding_mask.long().argmax(0) + 1 . I feel so because from the comments, ind_first_done denotes the final index in each trajectory.

Could you kindly explain the logic used for the mentioned snippet?

skkuai commented 2 years ago

I took a hint from your issue and modified the code like this,

all_zero = (~padding_mask).all(dim=0)
ind_first_done = padding_mask.long().argmax(0) + 1
ind_first_done = (~all_zero) * ind_first_done + all_zero * T

Then the model was trained well. Thank you.