lucidrains / reformer-pytorch

Reformer, the efficient Transformer, in Pytorch
MIT License
2.13k stars 256 forks source link

TOKEN_SELF_ATTN_VALUE and QK attention #134

Open lcmeng opened 3 years ago

lcmeng commented 3 years ago

Thanks for sharing the good work. I have a couple questions about the constant TOKEN_SELF_ATTN_VALUE and how it is used.

TOKEN_SELF_ATTN_VALUE is first defined here in reformer_pytorch.py with a comment saying "carefully set for half precision to work". Later, it's used in LSHAttention and FullQKAttention to mask out attention to self except when no other targets are available.

Thank you!