lucidrains x-transformers issues

lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.37k stars 370 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Is this the same "X-transformer" that being used in "X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism" paper?

#257 argadewanata closed 1 week ago
1
Random lack of gradients

#256 Baran-phys closed 1 month ago
1
Problem with cache and memory

#255 Baran-phys closed 1 month ago
0
Enable flash attention does not support BFloat16?

#254 Kaimary closed 1 week ago
1
How to use "src_key_padding_mask"

#253 LutherLin closed 1 month ago
2
Sinusoidal embedding order choice different from original definition

#252 gordicaleksa closed 1 month ago
1
migrate to less confusing way of doing rotary

#251 lucidrains closed 1 month ago
1
RoPE inconsistency (2-dim subspaces choice)

#250 gordicaleksa closed 1 month ago
0
Correct interaction between CLS token and RoPE

#249 oasidorshin closed 2 months ago
5
Question: problem with xval implementation

#248 HarshaSatyavardhan closed 2 months ago
5
[Bug] Error when `rotary_pos_emb` set to True in cross attention

#247 BakerBunker closed 2 months ago
3
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5

#246 junphine closed 2 months ago
1
[Question] very small attention scores

#245 pfeatherstone closed 1 month ago
7
Pass custom scale to flash attention

#244 Subuday closed 4 months ago
5
ContinuousTransformerWrapper: turning on absolute positional embedding: mirror TransformerWrapper

#243 pfeatherstone closed 4 months ago
2
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly

#242 pfeatherstone closed 2 months ago
17
Question: rotary embeddings and bad length extrapolation

#241 pfeatherstone closed 1 month ago
1
How can I add custom attention masks to a Decoder?

#240 DerEchteFeuerpfeil closed 4 months ago
3
Confusion about image->caption example

#239 mtran14 opened 5 months ago
1
Generation for PaLI?

#238 BurgerAndreas opened 5 months ago
0
`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`)

#237 amitkparekh closed 5 months ago
6
Multi Input/output transformers

#236 RyanKim17920 opened 5 months ago
5
Multi Input/Output transformers

#235 RyanKim17920 closed 5 months ago
1
Fix xpos when using mems

#234 pfeatherstone closed 5 months ago
3
RotaryEmbedding XPOS doesn't work with mems

#233 pfeatherstone closed 2 days ago
5
[Minor; noob question] Uniform distribution instead of normal

#232 p0p4k opened 5 months ago
0
Update x_transformers.py

#231 notprime closed 5 months ago
9
How to build optimizer

#230 pfeatherstone closed 5 months ago
9
Question: How to implement rel_pos_bias in cross_attention?

#229 alexdemartos closed 2 months ago
13
attn_num_mem_kv > 0 and attn_one_kv_head = True error

#228 pfeatherstone closed 5 months ago
8
Adding memmask to ContinuousTransformerWrapper

#227 pfeatherstone closed 5 months ago
3
Seq len missing in rotary embedding

#226 raganato closed 6 months ago
3
Removed biases breaks pre-trained models

#225 zqevans closed 5 months ago
5
Fix rotary embeddings when mems != None

#224 pfeatherstone closed 6 months ago
11
XL-recurrence with RotaryEmbedding and mems not working correctly.

#223 pfeatherstone closed 6 months ago
34
Enhancement: Multi Input/Output transformers

#222 RyanKim17920 opened 6 months ago
1
Question: How to load model trained on earlier version of x-transformers

#221 tmphex closed 6 months ago
3
Init bias=0 in to_logits

#220 ad8e closed 6 months ago
13
kv cache breaks generation

#219 ad8e closed 6 months ago
5
how to set inputs to the right shape

#218 emadkavousi opened 6 months ago
1
"Stabilizing Transformer Training by Preventing Attention Entropy Collapse" improvement to ViT

#217 catid closed 6 months ago
1
Question: num_memory_tokens > 0 and return_mems = True

#216 pfeatherstone closed 6 months ago
3
Support for NormSoftmax

#215 catid closed 7 months ago
16
Simplifying Transformer Blocks (https://arxiv.org/abs/2311.01906)

#214 Froskekongen closed 7 months ago
9
Bert token type embedding

#213 eyalmazuz closed 7 months ago
2
ONNX export failed

#212 pfeatherstone opened 7 months ago
14
Masking for prepend_embeds

#211 zqevans closed 7 months ago
7
rotary embedding issues when training in mixed precision

#210 zqevans closed 7 months ago
2
[Bug] ContinuousTransformerWrapper - return_mems doens't work

#209 pfeatherstone closed 7 months ago
1
Question: masking in token shifting

#208 pfeatherstone opened 7 months ago
1