Question on Initial guess tokens

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Apache License 2.0

1.11k stars 66 forks source link

Hello, the W tokens initialization is here: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/lade/decoding.py#L223

Actually, our method includes a multi-level lookahead window, and we only set one level at the beginning. It will run several warmup steps to fill the whole 2-D window.

The set_token function defines the token initialization method. You can change it by changing the code here: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/lade/decoding.py#L221C5-L221C26

Currently, we are copying the init tokens from the prompt. Actually, althrough tokens are randomly selected from the vocabulary, lookahead decoding can still converge and achieve speedup, thanks to the magic Jacobi iteration.

hao-ai-lab / LookaheadDecoding

Question on Initial guess tokens #8