lucidrains x-transformers issues

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.63k stars 395 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Unused Dropout Parameter

#175 XiaoWang-Github closed 1 year ago
1
Dimension mismatch with cross attention

#174 pradeep-pyro closed 1 year ago
4
Feature request: don't specify attn_flash. Select when possible

#173 pfeatherstone opened 1 year ago
2
Question: normalizing mask shape

#172 pfeatherstone closed 1 year ago
4
NTK-aware Scaled RoPE

#171 Jingyu-Fan closed 1 year ago
41
Is there a plan to handle the inference slowness？ eg. KV Cache

#170 liuzhuang1024 closed 1 year ago
9
correct input

#169 chogamy closed 1 year ago
1
Back-propagation on Mask for attention layers

#168 gaasher opened 1 year ago
1
ContinuousTransformerWrapper returning list of tensors as opposed to stack of tensors in 1.16.20

#167 gaasher closed 1 year ago
1
Feature request: support return_mems in ContinuousTransformerWrapper

#166 pfeatherstone opened 1 year ago
16
[Bug] attn_sparse_topk : NameError: name 'dots' is not defined

#165 Jingyu-Fan closed 1 year ago
2
Attention mask, is True True?

#164 TKassis closed 1 year ago
7
RuntimeError: output with shape [1, 1, 16, 16] doesn't match the broadcast shape [1, 8, 16, 16]

#163 BMontens closed 1 year ago
6
GPT training problem

#162 phuvinhnguyen opened 1 year ago
0
Replace .triu calls to allow ONNX export for CPU runtime

#161 jorgetavares closed 1 year ago
7
Incorrect boolean mask in flash attention

#160 stoprightthere closed 1 year ago
7
How to use the rope scaling with x_transformers ?

#159 cutoken opened 1 year ago
21
LoRA fine-tuning?

#158 hugofloresgarcia closed 1 year ago
2
Any plans to make a jax iteration of this repository? We really need it

#157 kyegomez opened 1 year ago
1
Small syntax error in LayerIntermediates

#156 prestonyun closed 1 year ago
1
Any plans to implement the new flash sparsed attention?

#155 kyegomez closed 1 year ago
5
Fix weight tying bug

#154 RameshArvind closed 1 year ago
1
ALiBi: buffered bias slicing gets confusing when `i != j`

#153 antony-frolov opened 1 year ago
0
AlibiPositionalBias: slicing buffered bias

#152 antony-frolov closed 1 year ago
1
Question about Over-Smoothing problem?

#151 Baran-phys closed 4 months ago
2
Dimension mismatch in attention (v1.16.1+)

#150 pradeep-pyro closed 1 year ago
2
Cascading heads

#149 lucidrains closed 1 year ago
0
exploring cascading heads from efficientvit paper, proposed for reduc…

#148 lucidrains closed 1 year ago
0
bug fixed in the forward method of LearnedAlibiPositionalBias class

#147 taemincho closed 1 year ago
1
Question about ViTransformerWrapper

#146 XiaoWang-Github closed 1 year ago
3
Question: clarification of ResiDual implementation

#145 alstonlo closed 1 year ago
6
Flash is not flash

#144 liujuncn opened 1 year ago
1
RuntimeError: No available kernel. Aborting execution.

#143 kyegomez opened 1 year ago
12
Enhanced recurrence question

#142 danieltudosiu closed 1 year ago
4
Could not call torch.save on the model

#141 frederikfab closed 1 year ago
1
Feature Request: Hyena Attention

#140 vvvm23 closed 1 year ago
0
Feature Request: Hyena Attention

#139 vvvm23 closed 1 year ago
0
Feature Request: Hyena Attention

#138 vvvm23 closed 1 year ago
0
Feature Request: Hyena Attention

#137 vvvm23 closed 1 year ago
6
Feature request: use scaled_dot_product_attention()

#136 pfeatherstone closed 1 year ago
6
Suggestion for OOD extrapolation power of Transformers

#135 Baran-phys closed 1 year ago
5
Cross attention between different shape tensors

#134 Baran-phys closed 1 year ago
4
Feature request: generate top k sequences

#133 yzhang-github-pub opened 1 year ago
2
Implementing a small ViT-VQGAN

#132 OhGreat closed 3 months ago
0
typo?

#131 yzhang-github-pub closed 1 year ago
1
test out stable entropy hypothesis (wip)

#130 lucidrains closed 1 year ago
0
BERT Training and Word-Level Tokenization

#129 XiaoWang-Github closed 1 year ago
1
Allow specifying scaled_sinu_pos_emb using XTransformers Interface

#128 ncoop57 closed 1 year ago
1
Possible Bug for Residual

#127 XiaoWang-Github closed 1 year ago
2
ONNX for example enwik8 example

#126 bitdom8 opened 1 year ago
2

Previous Next