lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

Huge model state dict size? #86

Open Liyue1d opened 2 years ago

Liyue1d commented 2 years ago

Hi,

I am getting very heavy files (100MB) when I am saving a module containing the PerformerLM module the recommended way. (torch.save(model.state_dict(), 'path')).

I tried with a PerformerLM module only that I just initialized and am having the same issue. Is this normal?