Open victor-psiori opened 3 years ago
I took a hint from your issue and modified the code like this,
all_zero = (~padding_mask).all(dim=0)
ind_first_done = padding_mask.long().argmax(0) + 1
ind_first_done = (~all_zero) * ind_first_done + all_zero * T
Then the model was trained well. Thank you.
Hi, Thanks for the code and the paper on using adaptive attention span in RL. In
train.py
, I haven't understood the logic for calculatingind_first_done
in following line:https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 .
After going through the loss calculations and
learn
function whereind_first_done
is used, I feel line: https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 should be as follows:ind_first_done = padding_mask.long().argmax(0) + 1
. I feel so because from the comments,ind_first_done
denotes the final index in each trajectory.Could you kindly explain the logic used for the mentioned snippet?