A question about allow_duplicate_attention

lucidrains / reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

MIT License

2.1k stars 254 forks source link

A question about allow_duplicate_attention #77

Closed L-Hugh closed 4 years ago

L-Hugh commented 4 years ago

https://github.com/lucidrains/reformer-pytorch/blob/352214dc4d2fb13c6018706ec226e60145a7c857/reformer_pytorch/reformer_pytorch.py#L305 Why "locs2 = (locs1 + 1) % chunk_size" rather than "locs2 = (locs1 - 1 + chunk_size) % chunk_size"? Q pays attention to the current chunk and the previous chunk rather than the next chunk. Please correct me if I'm wrong. Thank you!

lucidrains commented 4 years ago

@L-Hugh Hello! I am not too sure actually, I faithfully transcribed this from the trax implementation https://github.com/google/trax/blob/master/trax/layers/research/efficient_attention.py#L948 but didn't fully understand this section of the code

FYI: when I chatted with the Reformer team, they mentioned that for many of their tasks, this setting didn't make too much of a difference, and existed only for correctness' sake.

lucidrains commented 4 years ago

@L-Hugh I think that line may be doing something different than what you are thinking. Attending to the previous chunk is taken care of by this line I believe https://github.com/lucidrains/reformer-pytorch/blob/352214dc4d2fb13c6018706ec226e60145a7c857/reformer_pytorch/reformer_pytorch.py#L321