Closed kashif closed 1 year ago
In your code you split the sequence into a prefix and smaller window and calculate the cross-attention with respect to it...
https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L284
However, in the diagram of the method, the whole sequence is used for the V and K... Can you kindly confirm?
Thank you!
@kashif yup, well aware!
I reattach the key / values of the sequence to the prefix being cross attended to, which is equivalent to the entire sequence
https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L167
right! i missed that thanks!
In your code you split the sequence into a prefix and smaller window and calculate the cross-attention with respect to it...
https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L284
However, in the diagram of the method, the whole sequence is used for the V and K... Can you kindly confirm?
Thank you!