Closed inspirit closed 1 year ago
Hi Phil, would you mind explaining why do we need to rearange to (b, 1, n, d) here, I thought rotary_pos_emb should be (n, d) or (b, n, d): https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L153
i figured its needed for correct broadcast due to dropouts
Hi Phil, would you mind explaining why do we need to rearange to (b, 1, n, d) here, I thought rotary_pos_emb should be (n, d) or (b, n, d): https://github.com/lucidrains/perceiver-ar-pytorch/blob/685d77d152c55ef7210336566b952de7da631f68/perceiver_ar_pytorch/perceiver_ar_pytorch.py#L153