issues
search
lucidrains
/
performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.1k
stars
144
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Find FastAttention is slower, and also with more GPU memory usage
#97
phyllispeng123
opened
1 week ago
0
Separate Transformer Encoder & Decoder modules with linear attention?
#96
harshakmohan
opened
3 months ago
0
Modify the transformer tutorial based on performer
#95
HelloWorldLTY
opened
1 year ago
0
Cross-attention with arbitrary causal mask
#94
BarKetPlace
opened
1 year ago
0
Pretrained example
#93
jubueche
opened
1 year ago
0
Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
#92
michaelweihaosong
opened
1 year ago
1
Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules
#91
JGittles
opened
1 year ago
0
I want to use Peroformer on MAE
#90
Zhaoyi-Yan
opened
2 years ago
0
Question about masking
#89
Microbiods
opened
2 years ago
2
Question: Is Performer order equivariant? (can it transform an unordered set of tensors)
#88
nmakes
opened
2 years ago
0
Using Performer with GNNs
#87
jah377
opened
2 years ago
0
Huge model state dict size?
#86
Liyue1d
opened
2 years ago
0
Attention map
#85
merouone
closed
2 years ago
2
Performer Plain
#84
Rachel66666
opened
2 years ago
0
How to test the performer architecture for training new models?
#83
ayan-iiitd
opened
2 years ago
1
Output inconsistent for autoregressive performer
#82
GanjinZero
opened
2 years ago
2
Rotary Position Embedding
#81
ahmdtaha
opened
2 years ago
0
torch_tensorrt compilation fails
#80
FredHaa
opened
2 years ago
0
way to make two elements invisible?
#79
1140310118
opened
3 years ago
0
Add repetition penalty for text generation
#78
AlexandreDey
closed
3 years ago
0
Residual Connection
#77
jiyounglee-0523
closed
3 years ago
3
torch.max(data_dash) bug
#76
martinpflaum
closed
3 years ago
2
Fix torch.qr deprecation warning
#75
Erotemic
closed
3 years ago
1
Some little changes
#74
vasiliyeskin
opened
3 years ago
0
hyperbolic cosine based estimator
#73
gaganbahga
opened
3 years ago
0
Relative Positional Encoding for Linear Attention Models.
#72
Vbansal21
closed
2 years ago
3
Names `to_k`, `to_q`, `to_v`, `to_out` cause issues
#71
JamesDeAntonis
opened
3 years ago
0
Recover attention scores
#70
carlomarxdk
opened
3 years ago
3
FastAttention doesn't give results in agreement with standard attention?
#69
simonaxelrod
opened
3 years ago
7
Input and Context size in CrossAttention
#68
caffeinetoomuch
closed
3 years ago
2
Performer Benchmark
#67
CavallucciMartina
opened
3 years ago
0
Causal performer slower than causal regular attention
#66
JamesDeAntonis
opened
3 years ago
3
`to_out` bias
#65
JamesDeAntonis
closed
3 years ago
3
Causal linear attention benchmark
#64
caffeinetoomuch
closed
3 years ago
13
why is bias true in `to_<q,k,v>`?
#63
JamesDeAntonis
closed
3 years ago
4
Decoder Mask
#62
Muennighoff
opened
3 years ago
0
Getting error with the check_redraw_projections when using DataParallel
#61
Warvito
closed
3 years ago
4
context-specific embeddings from language model?
#60
rainwala
opened
3 years ago
0
Allow for performers to be used on cpu-only torch
#59
i404788
closed
3 years ago
2
Deterministic layers
#58
anklebreaker
opened
3 years ago
1
Saving checkpoints during training and loading
#57
ylhsieh
closed
3 years ago
3
Extra FF when using cross attention
#56
gulnazaki
closed
3 years ago
8
FixNorm alongside ScaleNorm
#55
gulnazaki
opened
3 years ago
3
Added fixed and axial positional embedding option
#54
gulnazaki
closed
3 years ago
1
Decoder randomly outputs NaN tensor.
#53
y-rokutan
closed
3 years ago
5
Performance gain replacing original attention to fast attention in this repo?
#52
phypan11
opened
3 years ago
2
Applying decoder input mask?
#51
maxmax1992
closed
3 years ago
2
Bug fix in original google-research implementation
#50
gulnazaki
closed
3 years ago
3
Plain Performer, if you are working with say images or other modalities
#49
haoshuai714
opened
3 years ago
1
Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?
#48
PascalHbr
opened
3 years ago
6
Next