Closed beleen23 closed 1 year ago
I have the same question
yup, look forward and backwards is in multiple of the window size
your diagram is correct!
the README diagram confused me as well so for others & future me: for look_forward=0, look_backward=1 the diagram looks exactly the same you just dissect it with the line of symmetry and you remove everything that's above.
EDIT: actually with exact_windowsize
set to True
we get the pattern in the README that's not chopped.
I'm struggling to know how does the attention pattern look. I understand it works in blocks, and you can choose how many blocks forward and backward you want to attend. However, it is not clear to me how is the shift between blocks done. Is the pattern as in the picture below (for a window size of 3, look_forward=1, look_backward=1)?
Or is the shift just one token?
Thank you! :)