hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.15k stars 67 forks source link

Related work: Prompt lookup decoding #45

Open shermansiu opened 10 months ago

shermansiu commented 10 months ago

https://github.com/apoorvumang/prompt-lookup-decoding

This method was recently merged into Huggingface transformers and also uses n-grams (found in the input prompt) to accelerate decoding.

learning-chip commented 10 months ago

Interesting, could you point to the merged PR? Does it support batching?

This method has a similar idea (copy from input, no Jacobi): https://github.com/alipay/PainlessInferenceAcceleration

shermansiu commented 10 months ago

Here's the PR: https://github.com/huggingface/transformers/pull/27775

From a cursory glance at the PR, it seems like it supports batching.

dongxiaolong commented 10 months ago

Here's the PR: huggingface/transformers#27775

From a cursory glance at the PR, it seems like it supports batching.

I have also noticed these two methods. Do you know the specific difference between them?

shermansiu commented 10 months ago

Lookahead decoding takes the n-grams from prior lookahead decoding steps /Jacobi trajectories. Prompt lookup decoding takes the n-grams from the prompt.

learning-chip commented 10 months ago

it seems like it supports batching.

It doesn't :/ https://github.com/huggingface/transformers/pull/27775#issuecomment-1901225695

shermansiu commented 10 months ago

Interesting. As the comment also suggests, it seems like PLD can support batching in theory - it's just the implementation that doesn't support it.

jivanph commented 10 months ago

Lookahead was mentioned here https://github.com/SafeAILab/EAGLE