hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.12k stars 66 forks source link

Error with USE_LADE option #65

Open lxnlxnlxnlxnlxn opened 1 day ago

lxnlxnlxnlxnlxn commented 1 day ago

I have installed the requirements successfully, and have got the correct result when running minimal.py. python minimal.py #no Lookahead decoding USE_LADE=1 LOAD_LADE=1 python minimal.py #use Lookahead decoding, 1.6x speedup But there is an error when chatting with chatbot.py bash setting USE_LADE=0 USE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug --chat #chat, without lookahead USE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug #no chat, without lookahead the log is as following

User: Which methods did Socrates employ to challenge the prevailing thoughts of his time?
Assistant: Traceback (most recent call last):
  File "/workspace/LookaheadDecoding/applications/chatbot.py", line 79, in <module>
    tmp_greedy_output = model.generate(input_ids=input_ids, **tmp_kwargs).tolist() #warmup
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate
    return self.greedy_search(
  File "/workspace/LookaheadDecoding/lade/decoding.py", line 26, in greedy_search_proxy
    return FUNC_MAP["greedy_search"](self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2579, in greedy_search
    outputs = self(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/LookaheadDecoding/lade/models/modeling_llama.py", line 1335, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'`
/workspace/models/TinyLlama-1.1B-Chat-v1.0 --debug
User: Which methods did Socrates employ to challenge the prevailing thoughts of his time?
Assistant: Traceback (most recent call last):
  File "/workspace/LookaheadDecoding/applications/chatbot.py", line 79, in <module>
    tmp_greedy_output = model.generate(input_ids=input_ids, **tmp_kwargs).tolist() #warmup
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate
    return self.greedy_search(
  File "/workspace/LookaheadDecoding/lade/decoding.py", line 26, in greedy_search_proxy
    return FUNC_MAP["greedy_search"](self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2579, in greedy_search
    outputs = self(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/LookaheadDecoding/lade/models/modeling_llama.py", line 1335, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

Furthermore, I can get the correct result when setting USE_LADE=1

I installed the project on Ubuntu 20.04 system with A800 GPUs, with pytorch=1.13.1 and cuda=11.6.

michaelyaoxxx commented 22 hours ago

I am having the same issue.

System Info: