issues
search
mit-han-lab
/
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k
stars
361
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Can you provide the code related to the visualization in the paper?
#86
micelvrice
opened
1 month ago
0
【question】Does streaming-llm focus on accelerating decoding stage? How about the prefilling stage?
#85
Code24Man
opened
2 months ago
0
Tokenizer issue with Transformers 4.33.0
#84
PedemonteGiacomo
opened
3 months ago
0
Evaluation code and dataset release inquiry
#83
DerrickYLJ
opened
3 months ago
0
How to visualize attention logits?
#82
OStars
closed
4 months ago
1
what is the difference between window attention and sliding window recomputation
#81
seeyourcell
closed
4 months ago
0
Progressively decreasing attention windows
#80
Vorlent
opened
4 months ago
0
Using LLaVA model
#79
JesseZZZZZ
opened
4 months ago
0
why `max_gen_len` is needed when considering `space_needed`?
#78
Mr-lonely0
opened
6 months ago
0
How to evaluate ppl?
#77
Jiawei-Yang
opened
6 months ago
2
StreamEval
#76
Zhangchaoran000
opened
8 months ago
0
Support mistral-7b?
#75
spring1915
opened
9 months ago
0
Run with start_size=0 looks just fine
#74
cyr0930
opened
9 months ago
0
question about positions encoding when apply ROLLING KV CACHE WITH ATTENTION SINKS
#73
bugm
closed
9 months ago
1
Error happened
#72
ForrestPi
opened
9 months ago
2
Questions about ARC datasets
#71
Zoeyyao27
opened
10 months ago
0
How much GPU memory needed to run example ?
#70
fangming-he
opened
10 months ago
3
Is there the way of parallel prompt ?
#69
DavideHe
opened
10 months ago
0
Question about attention sink arising in pretrained models
#68
kevinli573
opened
10 months ago
0
Request for Code and Details on Figures 2 and 7
#67
ZhouZineng
opened
10 months ago
0
Questions Related to the Application and Results of Attention Sinks After the Paper
#66
dsdanielpark
opened
10 months ago
0
Questions Regarding "Sink Tokens"
#65
clarenceluo78
opened
11 months ago
0
Doubts in "run_streaming_llama.py" file
#64
Rishab9991
opened
11 months ago
0
Question about Naive Sliding Window
#63
kevinli573
closed
11 months ago
2
why starting sink token is not a special token '\n'?
#62
dhcode-cpp
closed
11 months ago
2
Results for Section 3.2 Rolling KV Cache (Without Pretraining)
#61
timljj
opened
11 months ago
1
The position id for q
#60
ofhwei
opened
11 months ago
1
The reason for the importance of the initial token.
#59
freyamom
opened
11 months ago
0
[Feature Request] Support InternLM Model
#58
vansin
opened
11 months ago
1
Can support to ChatGLM2?
#57
KareEnges
opened
11 months ago
0
Enable explictly setting transformer model cache
#56
JiaxuanYou
opened
11 months ago
0
question about Table 1 in paper
#55
AresXD
opened
11 months ago
1
question about initial tokens
#54
chaojiewang94
opened
11 months ago
2
While streaming with sinks, how does the framework change the positional encodings of the KV cache without having to multiply with the Key and Value matrices?
#53
Bhuvanesh09
opened
11 months ago
4
Finetuning a model in the streaming mode ?
#52
MohamedAliRashad
closed
11 months ago
1
question about re-computation
#51
ysanimals
closed
11 months ago
4
Implementation of lama2 7b chat hf model
#50
MuhammadIshaq-AI
opened
11 months ago
7
Implementing lama2 7b
#49
MuhammadIshaq-AI
closed
11 months ago
0
Is code's position wrong with "kv_cache.evict_for_space" ?
#48
DavideHe
closed
11 months ago
2
some question about paper
#47
Vincentyua
closed
11 months ago
1
Does past_key_values be repeatedly compute?
#46
freyamom
opened
11 months ago
5
How to use streaming llm to train a new model? is there any sample code . thansk
#45
mega-cqz
closed
11 months ago
1
I'm (A Bit) Suspicious of Table 3.
#44
FrederickGeek8
closed
11 months ago
1
Questions on the demo results
#43
BitCalSaul
closed
11 months ago
2
Question on intuition of "attention sink" and "alibi PE"
#42
bowencohere
closed
11 months ago
3
Question about long input and difference between streaming-llm and dense attention.
#41
hxs91
closed
11 months ago
2
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
#40
chnl
closed
11 months ago
2
Question about evaluation results and demo
#39
hsm1997
closed
11 months ago
2
How to answer the question in the middle of long input
#38
yangzhj53
opened
12 months ago
0
RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU
#37
ZexinLi0w0
opened
12 months ago
4
Next