issues
search
lucidrains
/
x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k
stars
395
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
paper for GLU Mult Bias?
#275
TimS-ml
opened
4 days ago
0
Adding a latent vector at each layer
#274
JLenzy
closed
2 weeks ago
6
encoder.to_logits.weight doesn't update
#273
guillaumeguy
closed
2 weeks ago
3
Can't pickle <class 'torch.nn.attention._SDPBackend'>: attribute lookup _SDPBackend on torch.nn.attention failed
#272
guillaumeguy
closed
3 weeks ago
2
Potential bug in `model.generate`
#271
guillaumeguy
closed
3 weeks ago
5
[Question] Embedding Inputs to Transformer [batch, seq_len, embedding_dim]
#270
francotheengineer
closed
3 weeks ago
2
Update README.md
#269
brahmirathodd
closed
1 month ago
0
[Feature Request] CoPE from Meta
#268
JLenzy
closed
1 month ago
3
Allow passing in a pre-existing TokenEmbedding into TransformerWrapper
#267
Waino
closed
1 month ago
2
Allow passing in a pre-existing TokenEmbedding into TransformerWrapper
#266
Waino
closed
1 month ago
0
Pytorch warning with autocast
#265
pfeatherstone
closed
1 month ago
1
Classification with x-transformers
#264
RyanKim17920
opened
2 months ago
3
Decoder logits are NaN when cross-attention context is all padding
#263
giorgioskij
closed
2 months ago
2
Small paper ideas to be added
#262
RyanKim17920
opened
2 months ago
3
Should the mask option to AttentionLayers(boolean) increase memory?
#261
blasscoc
closed
2 months ago
1
Upgrade sdpa kernel
#260
Ryu1845
closed
3 weeks ago
4
Feature request: Multi-Head Latent Attention Support
#259
nanowell
closed
2 months ago
1
[Question] Why is RotaryEmbedding not used when cross attending?
#258
pfeatherstone
opened
2 months ago
1
Is this the same "X-transformer" that being used in "X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism" paper?
#257
argadewanata
closed
3 months ago
1
Random lack of gradients
#256
Baran-phys
closed
3 months ago
1
Problem with cache and memory
#255
Baran-phys
closed
3 months ago
0
Enable flash attention does not support BFloat16?
#254
Kaimary
closed
3 months ago
1
How to use "src_key_padding_mask"
#253
LutherLin
closed
4 months ago
3
Sinusoidal embedding order choice different from original definition
#252
gordicaleksa
closed
4 months ago
1
migrate to less confusing way of doing rotary
#251
lucidrains
closed
4 months ago
1
RoPE inconsistency (2-dim subspaces choice)
#250
gordicaleksa
closed
4 months ago
0
Correct interaction between CLS token and RoPE
#249
oasidorshin
closed
5 months ago
5
Question: problem with xval implementation
#248
HarshaSatyavardhan
closed
5 months ago
6
[Bug] Error when `rotary_pos_emb` set to True in cross attention
#247
BakerBunker
closed
5 months ago
3
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5
#246
junphine
closed
5 months ago
1
[Question] very small attention scores
#245
pfeatherstone
closed
4 months ago
7
Pass custom scale to flash attention
#244
Subuday
closed
7 months ago
5
ContinuousTransformerWrapper: turning on absolute positional embedding: mirror TransformerWrapper
#243
pfeatherstone
closed
7 months ago
2
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly
#242
pfeatherstone
closed
5 months ago
17
Question: rotary embeddings and bad length extrapolation
#241
pfeatherstone
closed
4 months ago
1
How can I add custom attention masks to a Decoder?
#240
DerEchteFeuerpfeil
closed
7 months ago
3
Confusion about image->caption example
#239
mtran14
opened
8 months ago
1
Generation for PaLI?
#238
BurgerAndreas
opened
8 months ago
0
`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`)
#237
amitkparekh
closed
8 months ago
6
Multi Input/output transformers
#236
RyanKim17920
closed
2 months ago
6
Multi Input/Output transformers
#235
RyanKim17920
closed
8 months ago
1
Fix xpos when using mems
#234
pfeatherstone
closed
8 months ago
3
RotaryEmbedding XPOS doesn't work with mems
#233
pfeatherstone
closed
3 months ago
5
[Minor; noob question] Uniform distribution instead of normal
#232
p0p4k
opened
8 months ago
0
Update x_transformers.py
#231
notprime
closed
8 months ago
9
How to build optimizer
#230
pfeatherstone
closed
8 months ago
9
Question: How to implement rel_pos_bias in cross_attention?
#229
alexdemartos
closed
4 months ago
13
attn_num_mem_kv > 0 and attn_one_kv_head = True error
#228
pfeatherstone
closed
8 months ago
8
Adding memmask to ContinuousTransformerWrapper
#227
pfeatherstone
closed
8 months ago
3
Seq len missing in rotary embedding
#226
raganato
closed
8 months ago
3
Next