lucidrains performer-pytorch issues

lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

MIT License

1.1k stars 144 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Find FastAttention is slower, and also with more GPU memory usage

#97 phyllispeng123 opened 1 week ago
0
Separate Transformer Encoder & Decoder modules with linear attention?

#96 harshakmohan opened 3 months ago
0
Modify the transformer tutorial based on performer

#95 HelloWorldLTY opened 1 year ago
0
Cross-attention with arbitrary causal mask

#94 BarKetPlace opened 1 year ago
0
Pretrained example

#93 jubueche opened 1 year ago
0
Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

#92 michaelweihaosong opened 1 year ago
1
Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules

#91 JGittles opened 1 year ago
0
I want to use Peroformer on MAE

#90 Zhaoyi-Yan opened 2 years ago
0
Question about masking

#89 Microbiods opened 2 years ago
2
Question: Is Performer order equivariant? (can it transform an unordered set of tensors)

#88 nmakes opened 2 years ago
0
Using Performer with GNNs

#87 jah377 opened 2 years ago
0
Huge model state dict size?

#86 Liyue1d opened 2 years ago
0
Attention map

#85 merouone closed 2 years ago
2
Performer Plain

#84 Rachel66666 opened 2 years ago
0
How to test the performer architecture for training new models?

#83 ayan-iiitd opened 2 years ago
1
Output inconsistent for autoregressive performer

#82 GanjinZero opened 2 years ago
2
Rotary Position Embedding

#81 ahmdtaha opened 2 years ago
0
torch_tensorrt compilation fails

#80 FredHaa opened 2 years ago
0
way to make two elements invisible?

#79 1140310118 opened 3 years ago
0
Add repetition penalty for text generation

#78 AlexandreDey closed 3 years ago
0
Residual Connection

#77 jiyounglee-0523 closed 3 years ago
3
torch.max(data_dash) bug

#76 martinpflaum closed 3 years ago
2
Fix torch.qr deprecation warning

#75 Erotemic closed 3 years ago
1
Some little changes

#74 vasiliyeskin opened 3 years ago
0
hyperbolic cosine based estimator

#73 gaganbahga opened 3 years ago
0
Relative Positional Encoding for Linear Attention Models.

#72 Vbansal21 closed 2 years ago
3
Names `to_k`, `to_q`, `to_v`, `to_out` cause issues

#71 JamesDeAntonis opened 3 years ago
0
Recover attention scores

#70 carlomarxdk opened 3 years ago
3
FastAttention doesn't give results in agreement with standard attention?

#69 simonaxelrod opened 3 years ago
7
Input and Context size in CrossAttention

#68 caffeinetoomuch closed 3 years ago
2
Performer Benchmark

#67 CavallucciMartina opened 3 years ago
0
Causal performer slower than causal regular attention

#66 JamesDeAntonis opened 3 years ago
3
`to_out` bias

#65 JamesDeAntonis closed 3 years ago
3
Causal linear attention benchmark

#64 caffeinetoomuch closed 3 years ago
13
why is bias true in `to_<q,k,v>`?

#63 JamesDeAntonis closed 3 years ago
4
Decoder Mask

#62 Muennighoff opened 3 years ago
0
Getting error with the check_redraw_projections when using DataParallel

#61 Warvito closed 3 years ago
4
context-specific embeddings from language model?

#60 rainwala opened 3 years ago
0
Allow for performers to be used on cpu-only torch

#59 i404788 closed 3 years ago
2
Deterministic layers

#58 anklebreaker opened 3 years ago
1
Saving checkpoints during training and loading

#57 ylhsieh closed 3 years ago
3
Extra FF when using cross attention

#56 gulnazaki closed 3 years ago
8
FixNorm alongside ScaleNorm

#55 gulnazaki opened 3 years ago
3
Added fixed and axial positional embedding option

#54 gulnazaki closed 3 years ago
1
Decoder randomly outputs NaN tensor.

#53 y-rokutan closed 3 years ago
5
Performance gain replacing original attention to fast attention in this repo?

#52 phypan11 opened 3 years ago
2
Applying decoder input mask?

#51 maxmax1992 closed 3 years ago
2
Bug fix in original google-research implementation

#50 gulnazaki closed 3 years ago
3
Plain Performer, if you are working with say images or other modalities

#49 haoshuai714 opened 3 years ago
1
Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?

#48 PascalHbr opened 3 years ago
6