lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.08k stars 141 forks source link

Current version seems to make saving and loading through model state dictionaries difficult #33

Open ThomasBJones2 opened 3 years ago

ThomasBJones2 commented 3 years ago

Because the projection matrix is created on the fly during the first forward pass, it isn't present in a brand new model. Thus, when loading from a state dictionary with parameters, the loaded state dictionary will be rejected until the model goes through at least one forward pass.

lucidrains commented 3 years ago

@ThomasBJones2 oops! fixed https://github.com/lucidrains/performer-pytorch/releases/tag/0.11.0