Open shermansiu opened 10 months ago
Interesting, could you point to the merged PR? Does it support batching?
This method has a similar idea (copy from input, no Jacobi): https://github.com/alipay/PainlessInferenceAcceleration
Here's the PR: https://github.com/huggingface/transformers/pull/27775
From a cursory glance at the PR, it seems like it supports batching.
Here's the PR: huggingface/transformers#27775
From a cursory glance at the PR, it seems like it supports batching.
I have also noticed these two methods. Do you know the specific difference between them?
Lookahead decoding takes the n-grams from prior lookahead decoding steps /Jacobi trajectories. Prompt lookup decoding takes the n-grams from the prompt.
it seems like it supports batching.
It doesn't :/ https://github.com/huggingface/transformers/pull/27775#issuecomment-1901225695
Interesting. As the comment also suggests, it seems like PLD can support batching in theory - it's just the implementation that doesn't support it.
Lookahead was mentioned here https://github.com/SafeAILab/EAGLE
https://github.com/apoorvumang/prompt-lookup-decoding
This method was recently merged into Huggingface
transformers
and also uses n-grams (found in the input prompt) to accelerate decoding.