idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

Is inconsitent axis order in RecurrentFullAttention intended? #54

Open hadaev8 opened 3 years ago

hadaev8 commented 3 years ago

FullAttention and RecurrentCrossFullAttention have nshe for keys/values While RecurrentFullAttention have nhse