issues
search
lucidrains
/
performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.08k
stars
141
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
SelfAttention layer seems to have large error relative to nn.MultiheadAttention?
#46
jueseph
opened
3 years ago
8
Question: torch.max term used in `softmax_kernel`
#45
ClawangTU
closed
3 years ago
4
No fp16 support from fast-transformers (CausalDotProduct)
#44
gulnazaki
opened
3 years ago
75
Performer Encoder Decoder architecture
#43
gulnazaki
closed
3 years ago
0
Triangular matrices ?
#42
jeremycochoy
closed
3 years ago
10
fix normalization for fast cuda version of causal
#41
lucidrains
closed
3 years ago
0
wrong implementation for autoregressive self-attention
#40
Sleepychord
closed
3 years ago
10
Use performer for finetunig task
#39
usmanhidral
opened
3 years ago
0
[Feature] EncoderDecoder framework, similar to ReformerEncDec
#38
gulnazaki
closed
3 years ago
22
RuntimeError: CUDA error: no kernel image is available for execution on the device
#37
james20141606
opened
3 years ago
1
A small question regarding `softmax_kernel`
#36
tianylin98
closed
3 years ago
1
Input ordering is not explicitly stated
#35
haakom
closed
3 years ago
2
Difficult installing on Windows machine
#34
rasin-tsukuba
opened
3 years ago
0
Current version seems to make saving and loading through model state dictionaries difficult
#33
ThomasBJones2
opened
3 years ago
1
Any performance comparison on standard benchmarks?
#32
KK666-AI
opened
3 years ago
0
Causal for images
#31
Etzelkut
closed
3 years ago
2
Add feature_redraw_interval option
#30
norabelrose
closed
3 years ago
8
Floating point exception @ loss.backward()
#29
AhmedCheikhRouhou
opened
3 years ago
0
There are no tests in this project, use_rezero=True is non-functional
#28
fcampagne
closed
3 years ago
10
Bug in FastAttention.forward()
#27
shayeboshi
closed
3 years ago
5
is dependency on pytorch-fast-transformers necessary?
#26
fcampagne
closed
3 years ago
2
add missing device assignment
#25
theblackcat102
closed
3 years ago
1
A Concrete Example of Use Performer-Pytorch into other Model checkpoint?
#24
ghost
opened
3 years ago
4
Performer Decoder
#23
qazwsxal
closed
3 years ago
3
Allow for no local attention heads
#22
qazwsxal
closed
3 years ago
0
Causal AutoRegressive Doubt
#21
HaldiramSharma
closed
3 years ago
3
Is it slower than original bert when training?
#20
yygle
closed
3 years ago
3
Relative position encoding
#19
sooheon
closed
3 years ago
14
definition of layer_drop()
#18
shi27feng
closed
3 years ago
2
Adding zeroes in softmax_kernel
#17
marhlder
closed
3 years ago
2
Load weights of transformer into PerformerLM
#16
Mazgis47
opened
3 years ago
6
unable to import cuda code for auto-regressive Performer
#15
batrlatom
opened
3 years ago
8
Regarding DDP and reversible networks
#14
Parskatt
closed
3 years ago
11
Inverse of renormalization matrix being used?
#13
sidnarayanan
closed
3 years ago
1
use performer for image detection
#12
madurner
closed
3 years ago
7
pip install error
#11
nu11s3c
opened
3 years ago
5
Results are not deterministic in eval mode
#10
arti32lehtonen
closed
3 years ago
4
Suggestion: Renormalization step for linear attention
#9
Parskatt
closed
3 years ago
2
Issue with biased estimates from QR decomposition
#8
Parskatt
closed
3 years ago
9
Causal model running on GPU
#7
Warvito
closed
3 years ago
7
Redrawing normalized samples using QR slows down training
#6
Parskatt
closed
3 years ago
4
Small issue in random matrix generation
#5
Parskatt
closed
3 years ago
1
Question: Scaling down number of random features depending on number of heads?
#4
Parskatt
closed
3 years ago
4
Feature Request: Enable generalized attention.
#3
Parskatt
closed
3 years ago
2
Show what is the performance on enwiki8 is across your projects
#2
bratao
closed
4 years ago
10
Collaborate on Implementation?
#1
calclavia
closed
4 years ago
9
Previous