lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.08k stars 141 forks source link

Deterministic layers #58

Open anklebreaker opened 3 years ago

anklebreaker commented 3 years ago

Hey, thanks for making this project!

Quick question, you have a fix_projection_matrices() method to make the model output deterministic. However, if we're only utilizing layers such as SelfAttention module, it appears to be creating a new random matrix in FastAttention upon initialization. Is there a similar way to make those savable matrices?

lucidrains commented 3 years ago

@anklebreaker hey! if you use the SelfAttention modules by themselves, I don't believe the projection matrices are redrawn, so they should always stay the same. (Correct me if I'm wrong on that..)