jlamprou Infini-Attention issues

jlamprou / Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval

https://arxiv.org/abs/2404.07143

58 stars 5 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Have you tried to segment the hidden states within the attention class?

#7 ZackZikaiXiao closed 4 months ago
7
AttributeError: 'NoneType' object has no attribute 'to_legacy_cache' when training

#6 ckybit closed 4 months ago
3
Did the class InfiniAttention repeat_kv for twice？

#5 Yukino256 closed 4 months ago
1
cuda out of memory

#4 riou-chen closed 4 months ago
1
Do we need `.backward(retain_graph=True)`?

#3 Beomi closed 4 months ago
2
missing import of apply_rotary_pos_emb

#2 winglian opened 4 months ago
0
Segmentation loop

#1 winglian closed 4 months ago
5

jlamprou / Infini-Attention

issues

Have you tried to segment the hidden states within the attention class?

AttributeError: 'NoneType' object has no attribute 'to_legacy_cache' when training

Did the class InfiniAttention repeat_kv for twice？

cuda out of memory

Do we need `.backward(retain_graph=True)`?

missing import of apply_rotary_pos_emb

Segmentation loop