mit-han-lab streaming-llm issues

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

https://arxiv.org/abs/2309.17453

MIT License

6.36k stars 355 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Tokenizer issue with Transformers 4.33.0

#84 PedemonteGiacomo opened 1 week ago
0
Evaluation code and dataset release inquiry

#83 DerrickYLJ opened 2 weeks ago
0
How to visualize attention logits?

#82 OStars closed 1 month ago
1
what is the difference between window attention and sliding window recomputation

#81 seeyourcell closed 1 month ago
0
Progressively decreasing attention windows

#80 Vorlent opened 1 month ago
0
Using LLaVA model

#79 JesseZZZZZ opened 1 month ago
0
why `max_gen_len` is needed when considering `space_needed`?

#78 Mr-lonely0 opened 3 months ago
0
How to evaluate ppl?

#77 Jiawei-Yang opened 3 months ago
1
StreamEval

#76 Zhangchaoran000 opened 5 months ago
0
Support mistral-7b?

#75 spring1915 opened 6 months ago
0
Run with start_size=0 looks just fine

#74 cyr0930 opened 6 months ago
0
question about positions encoding when apply ROLLING KV CACHE WITH ATTENTION SINKS

#73 bugm closed 6 months ago
1
Error happened

#72 ForrestPi opened 6 months ago
2
Questions about ARC datasets

#71 Zoeyyao27 opened 7 months ago
0
How much GPU memory needed to run example ?

#70 fangming-he opened 7 months ago
3
Is there the way of parallel prompt ?

#69 DavideHe opened 7 months ago
0
Question about attention sink arising in pretrained models

#68 kevinli573 opened 7 months ago
0
Request for Code and Details on Figures 2 and 7

#67 ZhouZineng opened 7 months ago
0
Questions Related to the Application and Results of Attention Sinks After the Paper

#66 dsdanielpark opened 7 months ago
0
Questions Regarding "Sink Tokens"

#65 clarenceluo78 opened 8 months ago
0
Doubts in "run_streaming_llama.py" file

#64 Rishab9991 opened 8 months ago
0
Question about Naive Sliding Window

#63 kevinli573 closed 8 months ago
2
why starting sink token is not a special token '\n'?

#62 dhcode-cpp closed 8 months ago
2
Results for Section 3.2 Rolling KV Cache (Without Pretraining)

#61 timljj opened 8 months ago
1
The position id for q

#60 ofhwei opened 8 months ago
1
The reason for the importance of the initial token.

#59 freyamom opened 8 months ago
0
[Feature Request] Support InternLM Model

#58 vansin opened 8 months ago
1
Can support to ChatGLM2?

#57 KareEnges opened 8 months ago
0
Enable explictly setting transformer model cache

#56 JiaxuanYou opened 8 months ago
0
question about Table 1 in paper

#55 AresXD opened 8 months ago
1
question about initial tokens

#54 chaojiewang94 opened 8 months ago
2
While streaming with sinks, how does the framework change the positional encodings of the KV cache without having to multiply with the Key and Value matrices?

#53 Bhuvanesh09 opened 8 months ago
4
Finetuning a model in the streaming mode ?

#52 MohamedAliRashad closed 8 months ago
1
question about re-computation

#51 ysanimals closed 8 months ago
4
Implementation of lama2 7b chat hf model

#50 MuhammadIshaq-AI opened 8 months ago
7
Implementing lama2 7b

#49 MuhammadIshaq-AI closed 8 months ago
0
Is code's position wrong with "kv_cache.evict_for_space" ?

#48 DavideHe closed 8 months ago
2
some question about paper

#47 Vincentyua closed 8 months ago
1
Does past_key_values be repeatedly compute?

#46 freyamom opened 8 months ago
5
How to use streaming llm to train a new model? is there any sample code . thansk

#45 mega-cqz closed 8 months ago
1
I'm (A Bit) Suspicious of Table 3.

#44 FrederickGeek8 closed 8 months ago
1
Questions on the demo results

#43 BitCalSaul closed 8 months ago
2
Question on intuition of "attention sink" and "alibi PE"

#42 bowencohere closed 8 months ago
3
Question about long input and difference between streaming-llm and dense attention.

#41 hxs91 closed 8 months ago
2
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

#40 chnl closed 8 months ago
2
Question about evaluation results and demo

#39 hsm1997 closed 8 months ago
2
How to answer the question in the middle of long input

#38 yangzhj53 opened 9 months ago
0
RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU

#37 ZexinLi0w0 opened 9 months ago
4
Questions about "Run Streaming Llama Chatbot"

#36 ChuanhongLi closed 8 months ago
3
Can support to codellama34b?

#35 willshion closed 9 months ago
1