hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.15k stars 67 forks source link

Run time error with `lade.config_lade(LEVEL=2)` #41

Closed learning-chip closed 10 months ago

learning-chip commented 10 months ago

The blog mentions that:

It's important to note that when N=2, lookahead decoding essentially becomes equivalent to Jacobi decodinH

However such edge case seems not properly handled. Setting lade.config_lade(LEVEL=2) leads to error:

$ LOAD_LADE=1 USE_LADE=1 python minimal.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/home/LookaheadDecoding/minimal.py", line 28, in <module>
    greedy_output = model.generate(**model_inputs, max_new_tokens=128)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
    return self.greedy_search(
  File "/home/LookaheadDecoding/lade/decoding.py", line 24, in greedy_search_proxy
    return jacobi_greedy_search_multilevel(self, chat=False, *args, **kwargs)
  File "/home/LookaheadDecoding/lade/decoding.py", line 257, in jacobi_greedy_search_multilevel
    if past_tokens[LEVEL - 2] is not None and lst_token in token_map and GUESS_SET_SIZE > 0:
UnboundLocalError: local variable 'lst_token' referenced before assignment
Viol2000 commented 10 months ago

Yes, the logic of LEVEL=2 is slightly different from LEVEL>2, and I did not implement LEVEL=2 for convenience. Our algorithm is similar to Jacobi decoding when LEVEL=2(N=2), but the logic is still slightly different: Jacobi decoding includes decoding and verifying in one branch, and we put decoding and verifying into two different branches. You can try this repo https://github.com/teelinsan/parallel-decoding if you want to use the original Jacobi decoding.

jivanph commented 10 months ago

I was wishing to compare speed ups between Jacobi decoding and LookAhead decoding, to better understand the trade-off between FLOPs and lower latency. Is it possible that in the future the case where LEVEL = 2 will be implemented in LADE?

Viol2000 commented 10 months ago

Currently, I am not planning to implement it. (And LEVEL=2 will not be pure Jacobi decoding you may want). It is better to follow this repo to implement a Jacobi https://github.com/teelinsan/parallel-decoding .