[Bug] Using Qwen1.5-1.8B-Chat and Qwen1.5-4B-Chat will cause the app to freeze and crash

MrRace commented 5 months ago

🐛 Bug

Has anyone encountered the situation where using Qwen1.5-4B-Chat and [Qwen1.5-1.8B-Chat ]()in mlc-llm, when click the chat entrance in, the model can be loaded normally, but after starting chatting with input text, it gets stuck, and then after a while, the entire application crashes?

To enable Qwen1.5-1.8B-Chat and Qwen1.5-4B-Chat to run on Android, model format conversion and app compilation were performed using mlc-llm. After installing the app on the phone, upon clicking the chat entry, the Qwen1.5-1.8B-Chat or Qwen1.5-4B-Chat model can be loaded normally. However, upon entering text to start chatting, the app gets stuck and after some time, the entire application crashes. Has anyone encountered this situation?

It is important to note that Qwen1.5-0.5B-Chat can load the model normally and engage in chatting with user input without any issues. The application crashing scenario described above occurs specifically with the Qwen1.5-4B-Chat and Qwen1.5-1.8B-Chat models. Since the application gets stuck or crashes after loading the model, entering text, and clicking send, there are no log messages available for reference.

To Reproduce

Steps to reproduce the behavior:

Step 1: Weight Conversion

MODEL_NAME=Qwen1.5-1.8B-Chat
QUANTIZATION=q4f16_1

# convert weights
mlc_llm convert_weight /share_model_zoo/LLM/Qwen/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/

Step 2: Generate Configuration Files

mlc_llm gen_config /share_model_zoo/LLM/Qwen/$MODEL_NAME/ --quantization $QUANTIZATION --model-type qwen2 --conv-template chatml -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

Step 3: Model Compilation

# 2. compile: compile model library with specification in mlc-chat-config.json

mlc_llm compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json --device android qwen2_1.8B-chat_q4f16_1 -o ./dist/libs/${MODEL_NAME}-${QUANTIZATION}-android.tar

This generates the dist/libs/Qwen1.5-1.8B-Chat-q4f16_1-android.tar file.

Step 4: Modify app-config.json File

Modify the contents of ./android/library/src/main/assets/app-config.json as follows:

{
  "model_list": [
    {
      "model_url": "https://huggingface.co/mlc-ai/Qwen-1_8B-Chat-q4f16_1-MLC/",
      "model_lib": "qwen_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen-1_8B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat",
      "model_lib": "qwen2_0.5B_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-0.5B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat",
      "model_lib": "qwen2_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-4B-Chat",
      "model_lib": "qwen2_4B-chat_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-4B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC",
      "model_lib": "gemma_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "gemma-2b-it-q4f16_1"
    }
  ],
  "model_lib_path_for_prepare_libs": {
    "qwen_q4f16_1": "libs/Qwen-1_8B-Chat-q4f16_1-android.tar",
    "qwen2_q4f16_1": "libs/Qwen1.5-1.8B-Chat-q4f16_1-android.tar",
    "qwen2_4B-chat_q4f16_1": "libs/Qwen1.5-4B-Chat-q4f16_1-android.tar",
    "gemma_q4f16_1": "libs/gemma-2b-it-q4f16_1-android.tar"
  }
}

Step 5: Bundle Model Library

# Bundle model library
cd ./android/library
./prepare_libs.sh

Step 6: Build Android App

# Build android app
cd .. && ./gradlew assembleDebug

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04.3 LTS
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...)
How you installed MLC-LLM (conda, source): mlc-llm-nightly-cu122=0.1.dev1002
How you installed TVM-Unity (pip, source): build from source, version=0.15.dev154+gc06ec1f24
Python version (e.g. 3.10): python3.11
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: OFF
CUDA_VERSION: 12.2
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: OFF
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: c06ec1f24548c0e94e15d3ea3c405f5f475b22af
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-16 20:29:40 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 18.1.1
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: none
USE_BNNS: OFF
USE_FLASHINFER: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

Any other relevant information:

Additional context

Hzfengsy commented 5 months ago

Please specify --context-window-size for Qwen 1.5. BTW I just ran it a few days ago, it works 151711345458_ pic

MrRace commented 5 months ago

@Hzfengsy Which version of Qwen1.5 are you specifically using? Qwen1.5-0.5B-Chat? Or Qwen1.5-1.8B-Chat? Or Qwen1.5-4B-Chat?

Hzfengsy commented 5 months ago

4B

zw0241 commented 4 months ago

@Hzfengsy Iam facing the same issue. Is your issue resolved now? "[Qwen1.5-1.8B-Chat ] in mlc-llm, when click the chat entrance in, the model can be loaded normally, but after starting chatting with input text, it gets stuck, and then after a while, the entire application crashes?"

mlc-ai / mlc-llm