I have installed the requirements successfully, and have got the correct result when running minimal.py.
python minimal.py #no Lookahead decodingUSE_LADE=1 LOAD_LADE=1 python minimal.py #use Lookahead decoding, 1.6x speedup
But there is an error when chatting with chatbot.py bash setting USE_LADE=0
USE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug --chat #chat, without lookaheadUSE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug #no chat, without lookahead
the log is as following
User: Which methods did Socrates employ to challenge the prevailing thoughts of his time?
Assistant: Traceback (most recent call last):
File "/workspace/LookaheadDecoding/applications/chatbot.py", line 79, in <module>
tmp_greedy_output = model.generate(input_ids=input_ids, **tmp_kwargs).tolist() #warmup
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate
return self.greedy_search(
File "/workspace/LookaheadDecoding/lade/decoding.py", line 26, in greedy_search_proxy
return FUNC_MAP["greedy_search"](self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2579, in greedy_search
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/LookaheadDecoding/lade/models/modeling_llama.py", line 1335, in forward
outputs = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'`
/workspace/models/TinyLlama-1.1B-Chat-v1.0 --debug
User: Which methods did Socrates employ to challenge the prevailing thoughts of his time?
Assistant: Traceback (most recent call last):
File "/workspace/LookaheadDecoding/applications/chatbot.py", line 79, in <module>
tmp_greedy_output = model.generate(input_ids=input_ids, **tmp_kwargs).tolist() #warmup
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate
return self.greedy_search(
File "/workspace/LookaheadDecoding/lade/decoding.py", line 26, in greedy_search_proxy
return FUNC_MAP["greedy_search"](self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2579, in greedy_search
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/LookaheadDecoding/lade/models/modeling_llama.py", line 1335, in forward
outputs = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'
Furthermore, I can get the correct result when setting USE_LADE=1
I installed the project on Ubuntu 20.04 system with A800 GPUs, with pytorch=1.13.1 and cuda=11.6.
I have installed the requirements successfully, and have got the correct result when running minimal.py.
python minimal.py #no Lookahead decoding
USE_LADE=1 LOAD_LADE=1 python minimal.py #use Lookahead decoding, 1.6x speedup
But there is an error when chatting with chatbot.py bash setting USE_LADE=0USE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug --chat #chat, without lookahead
USE_LADE=0 python applications/chatbot.py --model_path meta-llama/Llama-2-7b-chat-hf --debug #no chat, without lookahead
the log is as followingFurthermore, I can get the correct result when setting USE_LADE=1
I installed the project on Ubuntu 20.04 system with A800 GPUs, with pytorch=1.13.1 and cuda=11.6.