lucidrains x-transformers issues

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.63k stars 395 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Removed biases breaks pre-trained models

#225 zqevans closed 8 months ago
5
Fix rotary embeddings when mems != None

#224 pfeatherstone closed 8 months ago
11
XL-recurrence with RotaryEmbedding and mems not working correctly.

#223 pfeatherstone closed 8 months ago
34
Enhancement: Multi Input/Output transformers

#222 RyanKim17920 closed 2 months ago
1
Question: How to load model trained on earlier version of x-transformers

#221 tmphex closed 9 months ago
3
Init bias=0 in to_logits

#220 ad8e closed 9 months ago
13
kv cache breaks generation

#219 ad8e closed 9 months ago
5
how to set inputs to the right shape

#218 emadkavousi opened 9 months ago
1
"Stabilizing Transformer Training by Preventing Attention Entropy Collapse" improvement to ViT

#217 catid closed 8 months ago
1
Question: num_memory_tokens > 0 and return_mems = True

#216 pfeatherstone closed 9 months ago
3
Support for NormSoftmax

#215 catid closed 10 months ago
16
Simplifying Transformer Blocks (https://arxiv.org/abs/2311.01906)

#214 Froskekongen closed 10 months ago
9
Bert token type embedding

#213 eyalmazuz closed 10 months ago
2
ONNX export failed

#212 pfeatherstone opened 10 months ago
14
Masking for prepend_embeds

#211 zqevans closed 10 months ago
7
rotary embedding issues when training in mixed precision

#210 zqevans closed 10 months ago
2
[Bug] ContinuousTransformerWrapper - return_mems doens't work

#209 pfeatherstone closed 10 months ago
1
Question: masking in token shifting

#208 pfeatherstone opened 10 months ago
1
Do you consider adding rwkv

#207 chaodreaming closed 10 months ago
1
fix missed imports in continuous.py

#206 apage43 closed 10 months ago
1
Question: return_mems and rotary_pos_emb

#205 pfeatherstone closed 10 months ago
5
Fix for Issue 203

#204 anthonyzhou-1 closed 10 months ago
1
Using Rotary Positional Encoding with Continuous Wrapper

#203 anthonyzhou-1 closed 10 months ago
1
pre_norm_has_final_norm kwarg not used

#202 sashakunitsyn closed 11 months ago
1
Transformer Goat

#201 darkman111a closed 11 months ago
1
Question: attn_head_scale with use_scalenorm

#200 pfeatherstone closed 11 months ago
1
Feature request: different normalization layers at different depths

#199 pfeatherstone opened 11 months ago
0
Lack of the deep_norm variants of transformer

#198 ZegangC closed 11 months ago
1
High memory usage compared to Huggingface and Autocast has no effect?

#197 LarsHill opened 11 months ago
3
xval document

#196 lucidrains closed 11 months ago
0
add xval wrapper and autoregressive wrapper

#195 lucidrains closed 11 months ago
0
ContinuousTransformer num_memory_tokens bug

#194 pfeatherstone closed 11 months ago
1
[Question] difference between num_mem_kv and num_memory_tokens

#193 pfeatherstone closed 12 months ago
7
[Feature request] support num_memory_tokens in ContinuousTransformerWrapper

#192 pfeatherstone closed 12 months ago
7
cascading transformer comment typo

#191 p0p4k closed 1 year ago
3
Issue with qk scale between flash attention and normal attention when qk_norm is True

#190 mistycube closed 1 year ago
2
Issue with torch.compile

#189 scopello closed 1 year ago
4
Feature request: add local and reformer

#188 samvanstroud opened 1 year ago
1
Bugs in generation with cache and seq_start_pos

#187 LouChao98 closed 1 year ago
8
Support for flash attention 2

#186 scopello closed 1 year ago
8
autoregressive wrapper generate should accommodate variable lengthed prefixes

#185 LouChao98 closed 1 year ago
21
Sampling questions/(issues?)

#184 stas-sl closed 1 year ago
4
Masks in cross attention

#183 gpantaz closed 1 year ago
2
Questions on rotary position embedding

#182 LouChao98 closed 1 year ago
1
Idea: hyperparameter searching

#181 pfeatherstone opened 1 year ago
2
Torchsummary not working

#180 avocardio opened 1 year ago
0
TransformerWrapper wants inputs x of type long

#179 avocardio closed 1 year ago
3
How to use this package correctly?

#178 avocardio closed 1 year ago
0
return hidden states of all layers

#177 zhiaos closed 1 year ago
9
Do you consider adding retnet

#176 chaodreaming closed 1 year ago
15

Previous Next