issues
search
lucidrains
/
x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k
stars
395
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Unused Dropout Parameter
#175
XiaoWang-Github
closed
1 year ago
1
Dimension mismatch with cross attention
#174
pradeep-pyro
closed
1 year ago
4
Feature request: don't specify attn_flash. Select when possible
#173
pfeatherstone
opened
1 year ago
2
Question: normalizing mask shape
#172
pfeatherstone
closed
1 year ago
4
NTK-aware Scaled RoPE
#171
Jingyu-Fan
closed
1 year ago
41
Is there a plan to handle the inference slowness? eg. KV Cache
#170
liuzhuang1024
closed
1 year ago
9
correct input
#169
chogamy
closed
1 year ago
1
Back-propagation on Mask for attention layers
#168
gaasher
opened
1 year ago
1
ContinuousTransformerWrapper returning list of tensors as opposed to stack of tensors in 1.16.20
#167
gaasher
closed
1 year ago
1
Feature request: support return_mems in ContinuousTransformerWrapper
#166
pfeatherstone
opened
1 year ago
16
[Bug] attn_sparse_topk : NameError: name 'dots' is not defined
#165
Jingyu-Fan
closed
1 year ago
2
Attention mask, is True True?
#164
TKassis
closed
1 year ago
7
RuntimeError: output with shape [1, 1, 16, 16] doesn't match the broadcast shape [1, 8, 16, 16]
#163
BMontens
closed
1 year ago
6
GPT training problem
#162
phuvinhnguyen
opened
1 year ago
0
Replace .triu calls to allow ONNX export for CPU runtime
#161
jorgetavares
closed
1 year ago
7
Incorrect boolean mask in flash attention
#160
stoprightthere
closed
1 year ago
7
How to use the rope scaling with x_transformers ?
#159
cutoken
opened
1 year ago
21
LoRA fine-tuning?
#158
hugofloresgarcia
closed
1 year ago
2
Any plans to make a jax iteration of this repository? We really need it
#157
kyegomez
opened
1 year ago
1
Small syntax error in LayerIntermediates
#156
prestonyun
closed
1 year ago
1
Any plans to implement the new flash sparsed attention?
#155
kyegomez
closed
1 year ago
5
Fix weight tying bug
#154
RameshArvind
closed
1 year ago
1
ALiBi: buffered bias slicing gets confusing when `i != j`
#153
antony-frolov
opened
1 year ago
0
AlibiPositionalBias: slicing buffered bias
#152
antony-frolov
closed
1 year ago
1
Question about Over-Smoothing problem?
#151
Baran-phys
closed
4 months ago
2
Dimension mismatch in attention (v1.16.1+)
#150
pradeep-pyro
closed
1 year ago
2
Cascading heads
#149
lucidrains
closed
1 year ago
0
exploring cascading heads from efficientvit paper, proposed for reduc…
#148
lucidrains
closed
1 year ago
0
bug fixed in the forward method of LearnedAlibiPositionalBias class
#147
taemincho
closed
1 year ago
1
Question about ViTransformerWrapper
#146
XiaoWang-Github
closed
1 year ago
3
Question: clarification of ResiDual implementation
#145
alstonlo
closed
1 year ago
6
Flash is not flash
#144
liujuncn
opened
1 year ago
1
RuntimeError: No available kernel. Aborting execution.
#143
kyegomez
opened
1 year ago
12
Enhanced recurrence question
#142
danieltudosiu
closed
1 year ago
4
Could not call torch.save on the model
#141
frederikfab
closed
1 year ago
1
Feature Request: Hyena Attention
#140
vvvm23
closed
1 year ago
0
Feature Request: Hyena Attention
#139
vvvm23
closed
1 year ago
0
Feature Request: Hyena Attention
#138
vvvm23
closed
1 year ago
0
Feature Request: Hyena Attention
#137
vvvm23
closed
1 year ago
6
Feature request: use scaled_dot_product_attention()
#136
pfeatherstone
closed
1 year ago
6
Suggestion for OOD extrapolation power of Transformers
#135
Baran-phys
closed
1 year ago
5
Cross attention between different shape tensors
#134
Baran-phys
closed
1 year ago
4
Feature request: generate top k sequences
#133
yzhang-github-pub
opened
1 year ago
2
Implementing a small ViT-VQGAN
#132
OhGreat
closed
3 months ago
0
typo?
#131
yzhang-github-pub
closed
1 year ago
1
test out stable entropy hypothesis (wip)
#130
lucidrains
closed
1 year ago
0
BERT Training and Word-Level Tokenization
#129
XiaoWang-Github
closed
1 year ago
1
Allow specifying scaled_sinu_pos_emb using XTransformers Interface
#128
ncoop57
closed
1 year ago
1
Possible Bug for Residual
#127
XiaoWang-Github
closed
1 year ago
2
ONNX for example enwik8 example
#126
bitdom8
opened
1 year ago
2
Previous
Next