lucidrains performer-pytorch issues

lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

MIT License

1.08k stars 141 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

SelfAttention layer seems to have large error relative to nn.MultiheadAttention?

#46 jueseph opened 3 years ago
8
Question: torch.max term used in `softmax_kernel`

#45 ClawangTU closed 3 years ago
4
No fp16 support from fast-transformers (CausalDotProduct)

#44 gulnazaki opened 3 years ago
75
Performer Encoder Decoder architecture

#43 gulnazaki closed 3 years ago
0
Triangular matrices ?

#42 jeremycochoy closed 3 years ago
10
fix normalization for fast cuda version of causal

#41 lucidrains closed 3 years ago
0
wrong implementation for autoregressive self-attention

#40 Sleepychord closed 3 years ago
10
Use performer for finetunig task

#39 usmanhidral opened 3 years ago
0
[Feature] EncoderDecoder framework, similar to ReformerEncDec

#38 gulnazaki closed 3 years ago
22
RuntimeError: CUDA error: no kernel image is available for execution on the device

#37 james20141606 opened 3 years ago
1
A small question regarding `softmax_kernel`

#36 tianylin98 closed 3 years ago
1
Input ordering is not explicitly stated

#35 haakom closed 3 years ago
2
Difficult installing on Windows machine

#34 rasin-tsukuba opened 3 years ago
0
Current version seems to make saving and loading through model state dictionaries difficult

#33 ThomasBJones2 opened 3 years ago
1
Any performance comparison on standard benchmarks?

#32 KK666-AI opened 3 years ago
0
Causal for images

#31 Etzelkut closed 3 years ago
2
Add feature_redraw_interval option

#30 norabelrose closed 3 years ago
8
Floating point exception @ loss.backward()

#29 AhmedCheikhRouhou opened 3 years ago
0
There are no tests in this project, use_rezero=True is non-functional

#28 fcampagne closed 3 years ago
10
Bug in FastAttention.forward()

#27 shayeboshi closed 3 years ago
5
is dependency on pytorch-fast-transformers necessary?

#26 fcampagne closed 3 years ago
2
add missing device assignment

#25 theblackcat102 closed 3 years ago
1
A Concrete Example of Use Performer-Pytorch into other Model checkpoint?

#24 ghost opened 3 years ago
4
Performer Decoder

#23 qazwsxal closed 3 years ago
3
Allow for no local attention heads

#22 qazwsxal closed 3 years ago
0
Causal AutoRegressive Doubt

#21 HaldiramSharma closed 3 years ago
3
Is it slower than original bert when training?

#20 yygle closed 3 years ago
3
Relative position encoding

#19 sooheon closed 3 years ago
14
definition of layer_drop()

#18 shi27feng closed 3 years ago
2
Adding zeroes in softmax_kernel

#17 marhlder closed 3 years ago
2
Load weights of transformer into PerformerLM

#16 Mazgis47 opened 3 years ago
6
unable to import cuda code for auto-regressive Performer

#15 batrlatom opened 3 years ago
8
Regarding DDP and reversible networks

#14 Parskatt closed 3 years ago
11
Inverse of renormalization matrix being used?

#13 sidnarayanan closed 3 years ago
1
use performer for image detection

#12 madurner closed 3 years ago
7
pip install error

#11 nu11s3c opened 3 years ago
5
Results are not deterministic in eval mode

#10 arti32lehtonen closed 3 years ago
4
Suggestion: Renormalization step for linear attention

#9 Parskatt closed 3 years ago
2
Issue with biased estimates from QR decomposition

#8 Parskatt closed 3 years ago
9
Causal model running on GPU

#7 Warvito closed 3 years ago
7
Redrawing normalized samples using QR slows down training

#6 Parskatt closed 3 years ago
4
Small issue in random matrix generation

#5 Parskatt closed 3 years ago
1
Question: Scaling down number of random features depending on number of heads?

#4 Parskatt closed 3 years ago
4
Feature Request: Enable generalized attention.

#3 Parskatt closed 3 years ago
2
Show what is the performance on enwiki8 is across your projects

#2 bratao closed 4 years ago
10
Collaborate on Implementation?

#1 calclavia closed 4 years ago
9