lucidrains x-transformers issues

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.63k stars 395 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

paper for GLU Mult Bias?

#275 TimS-ml opened 4 days ago
0
Adding a latent vector at each layer

#274 JLenzy closed 2 weeks ago
6
encoder.to_logits.weight doesn't update

#273 guillaumeguy closed 2 weeks ago
3
Can't pickle <class 'torch.nn.attention._SDPBackend'>: attribute lookup _SDPBackend on torch.nn.attention failed

#272 guillaumeguy closed 3 weeks ago
2
Potential bug in `model.generate`

#271 guillaumeguy closed 3 weeks ago
5
[Question] Embedding Inputs to Transformer [batch, seq_len, embedding_dim]

#270 francotheengineer closed 3 weeks ago
2
Update README.md

#269 brahmirathodd closed 1 month ago
0
[Feature Request] CoPE from Meta

#268 JLenzy closed 1 month ago
3
Allow passing in a pre-existing TokenEmbedding into TransformerWrapper

#267 Waino closed 1 month ago
2
Allow passing in a pre-existing TokenEmbedding into TransformerWrapper

#266 Waino closed 1 month ago
0
Pytorch warning with autocast

#265 pfeatherstone closed 1 month ago
1
Classification with x-transformers

#264 RyanKim17920 opened 2 months ago
3
Decoder logits are NaN when cross-attention context is all padding

#263 giorgioskij closed 2 months ago
2
Small paper ideas to be added

#262 RyanKim17920 opened 2 months ago
3
Should the mask option to AttentionLayers(boolean) increase memory?

#261 blasscoc closed 2 months ago
1
Upgrade sdpa kernel

#260 Ryu1845 closed 3 weeks ago
4
Feature request: Multi-Head Latent Attention Support

#259 nanowell closed 2 months ago
1
[Question] Why is RotaryEmbedding not used when cross attending?

#258 pfeatherstone opened 2 months ago
1
Is this the same "X-transformer" that being used in "X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism" paper?

#257 argadewanata closed 3 months ago
1
Random lack of gradients

#256 Baran-phys closed 3 months ago
1
Problem with cache and memory

#255 Baran-phys closed 3 months ago
0
Enable flash attention does not support BFloat16?

#254 Kaimary closed 3 months ago
1
How to use "src_key_padding_mask"

#253 LutherLin closed 4 months ago
3
Sinusoidal embedding order choice different from original definition

#252 gordicaleksa closed 4 months ago
1
migrate to less confusing way of doing rotary

#251 lucidrains closed 4 months ago
1
RoPE inconsistency (2-dim subspaces choice)

#250 gordicaleksa closed 4 months ago
0
Correct interaction between CLS token and RoPE

#249 oasidorshin closed 5 months ago
5
Question: problem with xval implementation

#248 HarshaSatyavardhan closed 5 months ago
6
[Bug] Error when `rotary_pos_emb` set to True in cross attention

#247 BakerBunker closed 5 months ago
3
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5

#246 junphine closed 5 months ago
1
[Question] very small attention scores

#245 pfeatherstone closed 4 months ago
7
Pass custom scale to flash attention

#244 Subuday closed 7 months ago
5
ContinuousTransformerWrapper: turning on absolute positional embedding: mirror TransformerWrapper

#243 pfeatherstone closed 7 months ago
2
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly

#242 pfeatherstone closed 5 months ago
17
Question: rotary embeddings and bad length extrapolation

#241 pfeatherstone closed 4 months ago
1
How can I add custom attention masks to a Decoder?

#240 DerEchteFeuerpfeil closed 7 months ago
3
Confusion about image->caption example

#239 mtran14 opened 8 months ago
1
Generation for PaLI?

#238 BurgerAndreas opened 8 months ago
0
`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`)

#237 amitkparekh closed 8 months ago
6
Multi Input/output transformers

#236 RyanKim17920 closed 2 months ago
6
Multi Input/Output transformers

#235 RyanKim17920 closed 8 months ago
1
Fix xpos when using mems

#234 pfeatherstone closed 8 months ago
3
RotaryEmbedding XPOS doesn't work with mems

#233 pfeatherstone closed 3 months ago
5
[Minor; noob question] Uniform distribution instead of normal

#232 p0p4k opened 8 months ago
0
Update x_transformers.py

#231 notprime closed 8 months ago
9
How to build optimizer

#230 pfeatherstone closed 8 months ago
9
Question: How to implement rel_pos_bias in cross_attention?

#229 alexdemartos closed 4 months ago
13
attn_num_mem_kv > 0 and attn_one_kv_head = True error

#228 pfeatherstone closed 8 months ago
8
Adding memmask to ContinuousTransformerWrapper

#227 pfeatherstone closed 8 months ago
3
Seq len missing in rotary embedding

#226 raganato closed 8 months ago
3