lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

Residual Connection #77

Closed jiyounglee-0523 closed 2 years ago

jiyounglee-0523 commented 2 years ago

Hi! Thank you for providing wonderful implementation of performer.

I just want to ask a one little question. I noticed that there is no residual connection in the code, did I miss something?

Or does performer not have residual connection in the first place?

Thank you for listening to my question :)

ChaozhongLiu commented 2 months ago

@jiyounglee-0523 Hi there, it has been years but did you find an answer yourself? I might be too dumb and haven't yet found the residual connection in the code. Any hint is appreciated!

lucidrains commented 2 months ago

@ChaozhongLiu it happens here

ChaozhongLiu commented 2 months ago

Ahhhhh my Thanks!