issues
search
jlamprou
/
Infini-Attention
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval
https://arxiv.org/abs/2404.07143
58
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Have you tried to segment the hidden states within the attention class?
#7
ZackZikaiXiao
closed
4 months ago
7
AttributeError: 'NoneType' object has no attribute 'to_legacy_cache' when training
#6
ckybit
closed
4 months ago
3
Did the class InfiniAttention repeat_kv for twice?
#5
Yukino256
closed
4 months ago
1
cuda out of memory
#4
riou-chen
closed
4 months ago
1
Do we need `.backward(retain_graph=True)`?
#3
Beomi
closed
4 months ago
2
missing import of apply_rotary_pos_emb
#2
winglian
opened
4 months ago
0
Segmentation loop
#1
winglian
closed
4 months ago
5