hao-ai-lab LookaheadDecoding issues

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://arxiv.org/abs/2402.02057

Apache License 2.0

1.15k stars 67 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AttributeError: type object 'GenerationMixin' has no attribute 'greedy_search'

#67 grandxin opened 2 weeks ago
0
Question about batching strategy

#66 lxnlxnlxnlxnlxn opened 1 month ago
0
Error with USE_LADE option

#65 lxnlxnlxnlxnlxn opened 1 month ago
1
Yi Models support

#64 fayuge opened 5 months ago
1
How to optimize the scene with relatively short output?

#63 yangbohust opened 5 months ago
0
Bug in Greedy Search

#62 david-wei-01001 opened 6 months ago
0
Cannot use save_log and log_history

#61 david-wei-01001 opened 6 months ago
1
about verification branch

#60 dhcode-cpp opened 6 months ago
1
TypeError: LlamaSdpaAttention.forward() got an unexpected keyword argument 'lookahead'

#59 zev123456 opened 7 months ago
5
Tensor parallel

#58 wangyuwen1999 closed 4 months ago
2
How to use LADE in single-node multi-process way?

#57 sjrrr13 opened 7 months ago
2
[BUG Report] jacobi_greedy_search_multilevel function bug

#56 yangbohust opened 7 months ago
0
Do lookahead and repetition_penalty conflict?

#55 zhanweiw opened 7 months ago
0
BUG Report

#54 yangbohust opened 8 months ago
0
Why jacobi_sample_multilevel() fill window with argmax instead of also using sampling?

#53 yangbohust opened 8 months ago
0
question on verification beginning pos

#52 felixdae opened 8 months ago
1
Adaptive LEVEL/GUESS size?

#51 sahel-sh opened 8 months ago
0
flash attention and sampling

#50 Viol2000 closed 9 months ago
0
add paper

#49 Viol2000 closed 9 months ago
0
specific model

#48 qspang opened 10 months ago
13
Multiple GPUS

#47 qspang opened 10 months ago
9
Compatibility with Flash Attention 2

#46 jasonli0707 closed 10 months ago
2
Related work: Prompt lookup decoding

#45 shermansiu opened 10 months ago
7
Questions on combined attention mask structure for Jacobi iteration

#44 learning-chip opened 10 months ago
23
support hf v4.36

#43 Viol2000 closed 10 months ago
0
Undefined symbols in lade/decoding.py

#42 hjmus closed 10 months ago
1
Run time error with `lade.config_lade(LEVEL=2)`

#41 learning-chip closed 10 months ago
3
TypeError: 'NoneType' object is not subscriptable at line 427 of decoding.py

#40 henryxiao1997 closed 10 months ago
5
Any plan to support other LLMs, and integrated into huggingface?

#39 henryxiao1997 closed 11 months ago
2
fix llama kv cache

#38 jiqing-feng closed 10 months ago
2
Qs on Understanding Lookahead and Jacobi

#37 RonanKMcGovern closed 11 months ago
7
Can I get a MT-bench evaluation code for reproduction of acceleration?

#36 je1lee opened 11 months ago
11
Incompatible with LlamaSdpaAttention in transformers v4.36

#35 learning-chip closed 10 months ago
1
Why the tup needs to be added to the tail of token_map when it found in the token_map?

#34 kevinoldching closed 11 months ago
2
Why using Jacobi decoding? What are the advantages besides the fact that Jacobi decoding can reduce some of the decode steps？

#33 ZipECHO closed 9 months ago
19
Questions on the attention mask, and whether to accept the last element of guess_results when all guess_tokens are accepted

#32 YingHH1 opened 11 months ago
6
The meaning of the CHAT option in decoding

#31 YingHH1 closed 11 months ago
1
The inference results are inconsistent with Huggingface.

#30 cyfwry opened 11 months ago
6
Does Lade support topp/topk/temperature sampleing?

#29 AlvL1225 opened 11 months ago
2
question about attention patterns

#28 SUDA-HLT-ywfang closed 11 months ago
6
Can't run minimal.py on A100

#27 jiqing-feng closed 11 months ago
3
Benchmarks comparing with Medusa

#26 Rock-Anderson opened 12 months ago
0
Update decoding.py

#25 eltociear closed 12 months ago
0
Is it similar to ProphetNet, ProphetNet-Ads and BANG?

#24 qiweizhen closed 10 months ago
2
Is it the same as ProphetNet and ProphetNet-Ads?

#23 qiweizhen closed 10 months ago
0
amazing work!! any plan support for chatglm or Qwen model?

#22 white-wolf-tech opened 12 months ago
1
Can lade accelerate T5?

#21 yjdy closed 11 months ago
1
The Jacobi method and its corresponding code

#20 YingHH1 closed 12 months ago
2
No speed up

#19 Louis-y-nlp opened 12 months ago
5
Support for bnb nf4 quant?

#18 col-in-coding opened 12 months ago
1