issues
search
hao-ai-lab
/
LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.15k
stars
67
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AttributeError: type object 'GenerationMixin' has no attribute 'greedy_search'
#67
grandxin
opened
2 weeks ago
0
Question about batching strategy
#66
lxnlxnlxnlxnlxn
opened
1 month ago
0
Error with USE_LADE option
#65
lxnlxnlxnlxnlxn
opened
1 month ago
1
Yi Models support
#64
fayuge
opened
5 months ago
1
How to optimize the scene with relatively short output?
#63
yangbohust
opened
5 months ago
0
Bug in Greedy Search
#62
david-wei-01001
opened
6 months ago
0
Cannot use save_log and log_history
#61
david-wei-01001
opened
6 months ago
1
about verification branch
#60
dhcode-cpp
opened
6 months ago
1
TypeError: LlamaSdpaAttention.forward() got an unexpected keyword argument 'lookahead'
#59
zev123456
opened
7 months ago
5
Tensor parallel
#58
wangyuwen1999
closed
4 months ago
2
How to use LADE in single-node multi-process way?
#57
sjrrr13
opened
7 months ago
2
[BUG Report] jacobi_greedy_search_multilevel function bug
#56
yangbohust
opened
7 months ago
0
Do lookahead and repetition_penalty conflict?
#55
zhanweiw
opened
7 months ago
0
BUG Report
#54
yangbohust
opened
8 months ago
0
Why jacobi_sample_multilevel() fill window with argmax instead of also using sampling?
#53
yangbohust
opened
8 months ago
0
question on verification beginning pos
#52
felixdae
opened
8 months ago
1
Adaptive LEVEL/GUESS size?
#51
sahel-sh
opened
8 months ago
0
flash attention and sampling
#50
Viol2000
closed
9 months ago
0
add paper
#49
Viol2000
closed
9 months ago
0
specific model
#48
qspang
opened
10 months ago
13
Multiple GPUS
#47
qspang
opened
10 months ago
9
Compatibility with Flash Attention 2
#46
jasonli0707
closed
10 months ago
2
Related work: Prompt lookup decoding
#45
shermansiu
opened
10 months ago
7
Questions on combined attention mask structure for Jacobi iteration
#44
learning-chip
opened
10 months ago
23
support hf v4.36
#43
Viol2000
closed
10 months ago
0
Undefined symbols in lade/decoding.py
#42
hjmus
closed
10 months ago
1
Run time error with `lade.config_lade(LEVEL=2)`
#41
learning-chip
closed
10 months ago
3
TypeError: 'NoneType' object is not subscriptable at line 427 of decoding.py
#40
henryxiao1997
closed
10 months ago
5
Any plan to support other LLMs, and integrated into huggingface?
#39
henryxiao1997
closed
11 months ago
2
fix llama kv cache
#38
jiqing-feng
closed
10 months ago
2
Qs on Understanding Lookahead and Jacobi
#37
RonanKMcGovern
closed
11 months ago
7
Can I get a MT-bench evaluation code for reproduction of acceleration?
#36
je1lee
opened
11 months ago
11
Incompatible with LlamaSdpaAttention in transformers v4.36
#35
learning-chip
closed
10 months ago
1
Why the tup needs to be added to the tail of token_map when it found in the token_map?
#34
kevinoldching
closed
11 months ago
2
Why using Jacobi decoding? What are the advantages besides the fact that Jacobi decoding can reduce some of the decode steps?
#33
ZipECHO
closed
9 months ago
19
Questions on the attention mask, and whether to accept the last element of guess_results when all guess_tokens are accepted
#32
YingHH1
opened
11 months ago
6
The meaning of the CHAT option in decoding
#31
YingHH1
closed
11 months ago
1
The inference results are inconsistent with Huggingface.
#30
cyfwry
opened
11 months ago
6
Does Lade support topp/topk/temperature sampleing?
#29
AlvL1225
opened
11 months ago
2
question about attention patterns
#28
SUDA-HLT-ywfang
closed
11 months ago
6
Can't run minimal.py on A100
#27
jiqing-feng
closed
11 months ago
3
Benchmarks comparing with Medusa
#26
Rock-Anderson
opened
12 months ago
0
Update decoding.py
#25
eltociear
closed
12 months ago
0
Is it similar to ProphetNet, ProphetNet-Ads and BANG?
#24
qiweizhen
closed
10 months ago
2
Is it the same as ProphetNet and ProphetNet-Ads?
#23
qiweizhen
closed
10 months ago
0
amazing work!! any plan support for chatglm or Qwen model?
#22
white-wolf-tech
opened
12 months ago
1
Can lade accelerate T5?
#21
yjdy
closed
11 months ago
1
The Jacobi method and its corresponding code
#20
YingHH1
closed
12 months ago
2
No speed up
#19
Louis-y-nlp
opened
12 months ago
5
Support for bnb nf4 quant?
#18
col-in-coding
opened
12 months ago
1
Next