hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.15k stars 67 forks source link

The Jacobi method and its corresponding code #20

Closed YingHH1 closed 12 months ago

YingHH1 commented 12 months ago

In the blog, the Jacobi method is directed to the wiki page on Jacobi iterative method for solving linear system. But here we are solving nonlinear system of equations, so what exactly is the the algorithm? Could it be the Gauss-Jacobi method described in https://www.researchgate.net/profile/Romulo-Chumacero/publication/265633573_Solving_Nonlinear_Equations/links/556e350708aec2268308c3d3/Solving-Nonlinear-Equations.pdf

Furthermore, if it is an iterative method for solving nonlinear equations, then wouldn't the solutions being floating point numbers rather than the integers in input_ids for LLM?

I would like to understand the overall structure of the code implementation, and find out more details of the whole Lookahead decoding algorithm (including the solving nonlinear equations using Jacobi method part). However, I couldn't locate the relevant code, so any direction on where to look for code for implementing the Jacobi method would be greatly appreciated.

Many thanks for the help!

Viol2000 commented 12 months ago

Hi, thanks for your interest!

I think Jacobi is similar in non-linear and linear systems. If every problem in the system is a linear function, it is a linear system, or it will be a non-linear system. Moreover, I think Gauss-Seidel and Jacobi are different on the parallelism side, and Jacobi is more parallelized. You can take a look at these two papers: https://proceedings.mlr.press/v139/song21a/song21a.pdf and https://arxiv.org/pdf/2305.10427.pdf.

The previous methods tried to apply Jacobi-iteration in accelerating the language model decoding process, but I think they do not have a very visible speedup. I think one reason is the floating point problem. The Jacobi method does not work well as the input and output are discrete integers rather than floating points in the LLM decoding process.

Our method, though motivated by Jacobi iteration. It is actually different from it. The lookahead branch is like a delayed Jacobi iteration (I do not know its exact name) with several inputs from a much earlier step rather than the last step to solve several non-linear systems (LLM) to obtain output for each window slot. And we capture the tokens from this process. Here is another model detailed discuss. When the N-grams's N=2, it becomes a Jacobi iteration. Here, you can simply think that LLM decoding one token solves its non-linear problem. Parallelly decoding several consecutive tokens solves a non-linear system in parallel or in a Jacobi style.

Suppose you want to dive into the code. I think you should shortly discard the Jacobi iteration and focus on our algorithm instead because it is not exactly the Jacobi you want. You can take a look at and track this variable past_tokens. It updates as a Jacobi iteration to obtain new n-grams.

YingHH1 commented 12 months ago

Thank you for the response. I think I understand the whole algorithm now.