issues
search
mit-han-lab
/
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.36k
stars
355
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Tokenizer issue with Transformers 4.33.0
#84
PedemonteGiacomo
opened
1 week ago
0
Evaluation code and dataset release inquiry
#83
DerrickYLJ
opened
2 weeks ago
0
How to visualize attention logits?
#82
OStars
closed
1 month ago
1
what is the difference between window attention and sliding window recomputation
#81
seeyourcell
closed
1 month ago
0
Progressively decreasing attention windows
#80
Vorlent
opened
1 month ago
0
Using LLaVA model
#79
JesseZZZZZ
opened
1 month ago
0
why `max_gen_len` is needed when considering `space_needed`?
#78
Mr-lonely0
opened
3 months ago
0
How to evaluate ppl?
#77
Jiawei-Yang
opened
3 months ago
1
StreamEval
#76
Zhangchaoran000
opened
5 months ago
0
Support mistral-7b?
#75
spring1915
opened
6 months ago
0
Run with start_size=0 looks just fine
#74
cyr0930
opened
6 months ago
0
question about positions encoding when apply ROLLING KV CACHE WITH ATTENTION SINKS
#73
bugm
closed
6 months ago
1
Error happened
#72
ForrestPi
opened
6 months ago
2
Questions about ARC datasets
#71
Zoeyyao27
opened
7 months ago
0
How much GPU memory needed to run example ?
#70
fangming-he
opened
7 months ago
3
Is there the way of parallel prompt ?
#69
DavideHe
opened
7 months ago
0
Question about attention sink arising in pretrained models
#68
kevinli573
opened
7 months ago
0
Request for Code and Details on Figures 2 and 7
#67
ZhouZineng
opened
7 months ago
0
Questions Related to the Application and Results of Attention Sinks After the Paper
#66
dsdanielpark
opened
7 months ago
0
Questions Regarding "Sink Tokens"
#65
clarenceluo78
opened
8 months ago
0
Doubts in "run_streaming_llama.py" file
#64
Rishab9991
opened
8 months ago
0
Question about Naive Sliding Window
#63
kevinli573
closed
8 months ago
2
why starting sink token is not a special token '\n'?
#62
dhcode-cpp
closed
8 months ago
2
Results for Section 3.2 Rolling KV Cache (Without Pretraining)
#61
timljj
opened
8 months ago
1
The position id for q
#60
ofhwei
opened
8 months ago
1
The reason for the importance of the initial token.
#59
freyamom
opened
8 months ago
0
[Feature Request] Support InternLM Model
#58
vansin
opened
8 months ago
1
Can support to ChatGLM2?
#57
KareEnges
opened
8 months ago
0
Enable explictly setting transformer model cache
#56
JiaxuanYou
opened
8 months ago
0
question about Table 1 in paper
#55
AresXD
opened
8 months ago
1
question about initial tokens
#54
chaojiewang94
opened
8 months ago
2
While streaming with sinks, how does the framework change the positional encodings of the KV cache without having to multiply with the Key and Value matrices?
#53
Bhuvanesh09
opened
8 months ago
4
Finetuning a model in the streaming mode ?
#52
MohamedAliRashad
closed
8 months ago
1
question about re-computation
#51
ysanimals
closed
8 months ago
4
Implementation of lama2 7b chat hf model
#50
MuhammadIshaq-AI
opened
8 months ago
7
Implementing lama2 7b
#49
MuhammadIshaq-AI
closed
8 months ago
0
Is code's position wrong with "kv_cache.evict_for_space" ?
#48
DavideHe
closed
8 months ago
2
some question about paper
#47
Vincentyua
closed
8 months ago
1
Does past_key_values be repeatedly compute?
#46
freyamom
opened
8 months ago
5
How to use streaming llm to train a new model? is there any sample code . thansk
#45
mega-cqz
closed
8 months ago
1
I'm (A Bit) Suspicious of Table 3.
#44
FrederickGeek8
closed
8 months ago
1
Questions on the demo results
#43
BitCalSaul
closed
8 months ago
2
Question on intuition of "attention sink" and "alibi PE"
#42
bowencohere
closed
8 months ago
3
Question about long input and difference between streaming-llm and dense attention.
#41
hxs91
closed
8 months ago
2
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
#40
chnl
closed
8 months ago
2
Question about evaluation results and demo
#39
hsm1997
closed
8 months ago
2
How to answer the question in the middle of long input
#38
yangzhj53
opened
9 months ago
0
RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU
#37
ZexinLi0w0
opened
9 months ago
4
Questions about "Run Streaming Llama Chatbot"
#36
ChuanhongLi
closed
8 months ago
3
Can support to codellama34b?
#35
willshion
closed
9 months ago
1
Next