mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.08k stars 1.56k forks source link

[Bug] Llama2 InternalError: Check failed: (!output_ids_.empty()) is false #1029

Closed zhangxiao-stack closed 1 year ago

zhangxiao-stack commented 1 year ago

šŸ› Bug

File "/public/llm_chat.cc", line 791 InternalError: Check failed: (!outputids.empty()) is false:

To Reproduce

Steps to reproduce the behavior:

pyton3 chat.py
from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

cm = ChatModule(model="Llama-2-7b-chat-hf-q0f16")
# You can change to other models that you downloaded, for example,
# cm = ChatModule(model="Llama-2-13b-chat-hf-q4f16_1")  # Llama2 13b model

output = cm.generate(
   prompt="What is the meaning of life?",
   progress_callback=StreamToStdout(callback_interval=5),
)

# Print prefill and decode performance statistics
print(f"Statistics: {cm.stats()}\n")

output = cm.generate(
   prompt="How many points did you list out?",
   progress_callback=StreamToStdout(callback_interval=5),
)

Statistics: prefill: 64.5 tok/s, decode: 96.2 tok/s

Traceback (most recent call last):
  File "chat.py", line 20, in <module>
    output = cm.generate(
  File "/usr/local/lib/python3.8/dist-packages/mlc_chat-0.0.0-py3.8-linux-x86_64.egg/mlc_chat/chat_module.py", line 650, in generate
    self._decode()
  File "/usr/local/lib/python3.8/dist-packages/mlc_chat-0.0.0-py3.8-linux-x86_64.egg/mlc_chat/chat_module.py", line 845, in _decode
    self._decode_func()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.8/dist-packages/tvm-0.12.dev1610+gceaf7b015-py3.8-linux-x86_64.egg/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/public/llm_chat.cc", line 1272, in mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#8}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
  File "/public/llm_chat.cc", line 790, in mlc::llm::LLMChat::DecodeStep()
  File "/public/llm_chat.cc", line 791, in mlc::llm::LLMChat::DecodeStep()
tvm.error.InternalError: Traceback (most recent call last):
  2: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#8}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /public/llm_chat.cc:1272
  1: mlc::llm::LLMChat::DecodeStep()
        at /public/llm_chat.cc:790
  0: mlc::llm::LLMChat::DecodeStep()
        at /public/llm_chat.cc:791
  File "/public/llm_chat.cc", line 791
InternalError: Check failed: (!output_ids_.empty()) is false: 

Expected behavior

Environment

Additional context

CharlieFRuan commented 1 year ago

Hi @zhangxiao-stack, thanks for reporting! This seems to be a weird issue... I wasn't able to replicate it on my end. Besides, this tutorial (runnable on Colab) which uses the most recent mlc packages works fine as well.

Would you mind reinstalling the packages and retry? https://mlc.ai/package/

zhangxiao-stack commented 1 year ago

@CharlieFRuan , thanks for the reply,I reinstall the mlc_chat* packages with Source Code follows:

step1:

git branch -v
* main 5790c74 [Docs] README revamp (#980)
mkdir build
python3 ../cmake/gen_cmake_config.py
cmake .. && cmake --build . --parallel $(nproc) && cd ..

step2:

python3 build.py --model Llama-2-7b-chat-hf --hf-path ./dist/models/Llama-2-7b-chat-hf --quantization q0f16 --target cuda

step3:

 ./build/mlc_chat_cli --model Llama-2-7b-chat-hf-q0f16 --device cuda

/INST]: [02:55:57] /root/mlc-llm/cpp/llm_chat.cc:791: InternalError: Check failed: (!output_ids_.empty()) is false: 

reinstall the mlc_chat* packages with pip wheel follows: step1:

download mlc_chat_nightly_cu118-0.1.dev476-cp38-cp38-manylinux_2_28_x86_64.whl
pip install  mlc_chat_nightly_cu118-0.1.dev476-cp38-cp38-manylinux_2_28_x86_64.whl

step2:

from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# From the mlc-llm directory, run
# $ python sample_mlc_chat.py

# Create a ChatModule instance
cm = ChatModule(model="Llama-2-7b-chat-hf-q0f16")
# You can change to other models that you downloaded, for example,
# cm = ChatModule(model="Llama-2-13b-chat-hf-q4f16_1")  # Llama2 13b model

output = cm.generate(
   prompt="What is the meaning of life?",
   progress_callback=StreamToStdout(callback_interval=5),
)
print(output)
# Print prefill and decode performance statistics
print(f"Statistics: {cm.stats()}\n")

output = cm.generate(
   prompt="How many points did you list out?",
   progress_callback=StreamToStdout(callback_interval=5),
)

print(f"Statistics: {cm.stats()}\n")
print(f"Generated text:\n{output}\n")

errors:
Statistics: prefill: 58.6 tok/s, decode: 85.4 tok/s

Traceback (most recent call last):
  File "chat.py", line 22, in <module>
    output = cm.generate(
  File "/usr/local/lib/python3.8/dist-packages/mlc_chat/chat_module.py", line 663, in generate
    self._decode()
  File "/usr/local/lib/python3.8/dist-packages/mlc_chat/chat_module.py", line 900, in _decode
    self._decode_func()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.8/dist-packages/tvm-0.12.dev1610+gceaf7b015-py3.8-linux-x86_64.egg/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/llm_chat.cc", line 837, in mlc::llm::LLMChat::DecodeStep()
tvm.error.InternalError: Traceback (most recent call last):
  0: mlc::llm::LLMChat::DecodeStep()
        at /workspace/mlc-llm/cpp/llm_chat.cc:837
  File "/workspace/mlc-llm/cpp/llm_chat.cc", line 837
InternalError: Check failed: (!output_ids_.empty()) is false: 
CharlieFRuan commented 1 year ago

The repo does not seem up to date here:

git branch -v
* main 5790c74 [Docs] README revamp (#980)

You can pull again, and build from source.

For the prebuilt, mlc_chat_nightly_cu118-0.1.dev476-cp38-cp38-manylinux_2_28_x86_64.whl does not seem up to date either. Could you try:

pip install --pre --force-reinstall mlc-ai-nightly-cu118 mlc-chat-nightly-cu118 -f https://mlc.ai/wheels
zhangxiao-stack commented 1 year ago

@CharlieFRuan I update the mlc-llm sourcecode

branch -v
* main 20131fb Update README.md (#1045)

step1: reinstall mlc-llm with sourcecode step2:build and run Llama-2-7b-chat-hf

 python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target cuda --quantization q0f16 --use-cache=0
 ./build/mlc_chat_cli --model Llama-2-7b-chat-hf-q0f16 --device cuda
****[INST]: hi
[/INST]: [02:07:08] /root/mlc-llm.latest/cpp/llm_chat.cc:818: InternalError: Check failed: (!output_ids_.empty()) is false:**** 

step2: build vicuna-7b-v1.1-q0f16 in same way ,no errors occurred

python3 -m mlc_llm.build --model vicuna-7b-v1.1 --target cuda --quantization q0f16 --use-cache=0
./build/mlc_chat_cli --model vicuna-7b-v1.1-q0f16 --device cuda
USER: hello
ASSISTANT: Hello! How can I help you today? Is there something you would like to talk about or ask me a question about? I'm here to assist you with any information or guidance you may need.