lucidrains / perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch
MIT License
86 stars 4 forks source link

Why do we add empty dimension here? #6

Closed inspirit closed 1 year ago

inspirit commented 1 year ago

Hi Phil, would you mind explaining why do we need to rearange to (b, 1, n, d) here, I thought rotary_pos_emb should be (n, d) or (b, n, d): https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L153

inspirit commented 1 year ago

i figured its needed for correct broadcast due to dropouts