issues
search
lucidrains
/
x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k
stars
395
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Removed biases breaks pre-trained models
#225
zqevans
closed
8 months ago
5
Fix rotary embeddings when mems != None
#224
pfeatherstone
closed
8 months ago
11
XL-recurrence with RotaryEmbedding and mems not working correctly.
#223
pfeatherstone
closed
8 months ago
34
Enhancement: Multi Input/Output transformers
#222
RyanKim17920
closed
2 months ago
1
Question: How to load model trained on earlier version of x-transformers
#221
tmphex
closed
9 months ago
3
Init bias=0 in to_logits
#220
ad8e
closed
9 months ago
13
kv cache breaks generation
#219
ad8e
closed
9 months ago
5
how to set inputs to the right shape
#218
emadkavousi
opened
9 months ago
1
"Stabilizing Transformer Training by Preventing Attention Entropy Collapse" improvement to ViT
#217
catid
closed
8 months ago
1
Question: num_memory_tokens > 0 and return_mems = True
#216
pfeatherstone
closed
9 months ago
3
Support for NormSoftmax
#215
catid
closed
10 months ago
16
Simplifying Transformer Blocks (https://arxiv.org/abs/2311.01906)
#214
Froskekongen
closed
10 months ago
9
Bert token type embedding
#213
eyalmazuz
closed
10 months ago
2
ONNX export failed
#212
pfeatherstone
opened
10 months ago
14
Masking for prepend_embeds
#211
zqevans
closed
10 months ago
7
rotary embedding issues when training in mixed precision
#210
zqevans
closed
10 months ago
2
[Bug] ContinuousTransformerWrapper - return_mems doens't work
#209
pfeatherstone
closed
10 months ago
1
Question: masking in token shifting
#208
pfeatherstone
opened
10 months ago
1
Do you consider adding rwkv
#207
chaodreaming
closed
10 months ago
1
fix missed imports in continuous.py
#206
apage43
closed
10 months ago
1
Question: return_mems and rotary_pos_emb
#205
pfeatherstone
closed
10 months ago
5
Fix for Issue 203
#204
anthonyzhou-1
closed
10 months ago
1
Using Rotary Positional Encoding with Continuous Wrapper
#203
anthonyzhou-1
closed
10 months ago
1
pre_norm_has_final_norm kwarg not used
#202
sashakunitsyn
closed
11 months ago
1
Transformer Goat
#201
darkman111a
closed
11 months ago
1
Question: attn_head_scale with use_scalenorm
#200
pfeatherstone
closed
11 months ago
1
Feature request: different normalization layers at different depths
#199
pfeatherstone
opened
11 months ago
0
Lack of the deep_norm variants of transformer
#198
ZegangC
closed
11 months ago
1
High memory usage compared to Huggingface and Autocast has no effect?
#197
LarsHill
opened
11 months ago
3
xval document
#196
lucidrains
closed
11 months ago
0
add xval wrapper and autoregressive wrapper
#195
lucidrains
closed
11 months ago
0
ContinuousTransformer num_memory_tokens bug
#194
pfeatherstone
closed
11 months ago
1
[Question] difference between num_mem_kv and num_memory_tokens
#193
pfeatherstone
closed
12 months ago
7
[Feature request] support num_memory_tokens in ContinuousTransformerWrapper
#192
pfeatherstone
closed
12 months ago
7
cascading transformer comment typo
#191
p0p4k
closed
1 year ago
3
Issue with qk scale between flash attention and normal attention when qk_norm is True
#190
mistycube
closed
1 year ago
2
Issue with torch.compile
#189
scopello
closed
1 year ago
4
Feature request: add local and reformer
#188
samvanstroud
opened
1 year ago
1
Bugs in generation with cache and seq_start_pos
#187
LouChao98
closed
1 year ago
8
Support for flash attention 2
#186
scopello
closed
1 year ago
8
autoregressive wrapper generate should accommodate variable lengthed prefixes
#185
LouChao98
closed
1 year ago
21
Sampling questions/(issues?)
#184
stas-sl
closed
1 year ago
4
Masks in cross attention
#183
gpantaz
closed
1 year ago
2
Questions on rotary position embedding
#182
LouChao98
closed
1 year ago
1
Idea: hyperparameter searching
#181
pfeatherstone
opened
1 year ago
2
Torchsummary not working
#180
avocardio
opened
1 year ago
0
TransformerWrapper wants inputs x of type long
#179
avocardio
closed
1 year ago
3
How to use this package correctly?
#178
avocardio
closed
1 year ago
0
return hidden states of all layers
#177
zhiaos
closed
1 year ago
9
Do you consider adding retnet
#176
chaodreaming
closed
1 year ago
15
Previous
Next