hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://arxiv.org/abs/2402.02057

Apache License 2.0

1.15k stars 67 forks source link

specific model #48

Open qspang opened 10 months ago

qspang commented 10 months ago

Can I ask you what models the following three images correspond to?

Viol2000 commented 10 months ago

1) codellama 2) codellama-instruct 3) llama-2-chat

qspang commented 10 months ago

btw, may I ask if AVG COMPRESS RATIO uses Lookahead’s acceleration effect?

Viol2000 commented 10 months ago

AVG COMPRESS RATIO shows the #generated tokens/#decoding steps with lookahead decoding. It is the upper bound of the speedups.

qspang commented 10 months ago

I want to know the specific acceleration effect. Which indicator should I look at?

Viol2000 commented 10 months ago

I guess you need to re-run the experiments without lade, like running with USE_LADE=0. Then you can compare the average throughputs shown in the figure above with the throughputs without lade.

qspang commented 10 months ago

but there are two average throughputs:average throughputs1、average throughputs2，Which one should be used for comparison, or should both be used for comparison and then averaged?

Viol2000 commented 10 months ago

I think both are reasonable. Throughtput1 is the sum(throughput for each questions)/#questions. Throughput2 is the #generated tokens/#decoding steps for the whole dataset.

qspang commented 10 months ago

Thank you for your patient reply!!!:)

jivanph commented 10 months ago

How do you measure "throughput for each question"?

Viol2000 commented 10 months ago

generated tokens/#time

xinlong-yang commented 10 months ago

generated tokens/#time

Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?

Viol2000 commented 10 months ago

generated tokens/#time

Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?

I will upload them in the following one or two weeks.

xinlong-yang commented 10 months ago

generated tokens/#time

Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?

I will upload them in the following one or two weeks.

ok, thanks for you amazing work!