Why do we add empty dimension here?

lucidrains / perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch

MIT License

86 stars 4 forks source link

Closed inspirit closed 1 year ago

inspirit commented 1 year ago

inspirit commented 1 year ago

i figured its needed for correct broadcast due to dropouts