lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.08k stars 141 forks source link

Modify the transformer tutorial based on performer #95

Open HelloWorldLTY opened 1 year ago

HelloWorldLTY commented 1 year ago

Hi, I intend to apply performer to finish this tutorial for transformer.

https://pytorch.org/tutorials/search.html?q=pre-training&check_keywords=yes&area=default

However, I received such an error:

ext, mask, context_mask, **kwargs) 426 if exists(context_mask): 427 global_mask = context_mask[:, None, :, None] --> 428 v.maskedfill(~global_mask, 0.) 430 if exists(pos_emb) and not cross_attend: 431 q, k = apply_rotary_pos_emb(q, k, pos_emb)

RuntimeError: The expanded size of the tensor (20) must match the existing size (35) at non-singleton dimension 2. Target sizes: [35, 4, 20, 64]. Tensor sizes: [35, 1, 35, 1]

I did not know why. Is this model different from transformer? Thanks a lot.