mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.63k stars 1.51k forks source link

[Bug] Using Qwen1.5-1.8B-Chat and Qwen1.5-4B-Chat will cause the app to freeze and crash #2000

Closed MrRace closed 4 months ago

MrRace commented 5 months ago

πŸ› Bug

Has anyone encountered the situation where using Qwen1.5-4B-Chat and [Qwen1.5-1.8B-Chat ]()in mlc-llm, when click the chat entrance in, the model can be loaded normally, but after starting chatting with input text, it gets stuck, and then after a while, the entire application crashes?

To enable Qwen1.5-1.8B-Chat and Qwen1.5-4B-Chat to run on Android, model format conversion and app compilation were performed using mlc-llm. After installing the app on the phone, upon clicking the chat entry, the Qwen1.5-1.8B-Chat or Qwen1.5-4B-Chat model can be loaded normally. However, upon entering text to start chatting, the app gets stuck and after some time, the entire application crashes. Has anyone encountered this situation?

It is important to note that Qwen1.5-0.5B-Chat can load the model normally and engage in chatting with user input without any issues. The application crashing scenario described above occurs specifically with the Qwen1.5-4B-Chat and Qwen1.5-1.8B-Chat models. Since the application gets stuck or crashes after loading the model, entering text, and clicking send, there are no log messages available for reference.

To Reproduce

Steps to reproduce the behavior:

Step 1: Weight Conversion

MODEL_NAME=Qwen1.5-1.8B-Chat
QUANTIZATION=q4f16_1

# convert weights
mlc_llm convert_weight /share_model_zoo/LLM/Qwen/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/

Step 2: Generate Configuration Files

mlc_llm gen_config /share_model_zoo/LLM/Qwen/$MODEL_NAME/ --quantization $QUANTIZATION --model-type qwen2 --conv-template chatml -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

Step 3: Model Compilation

# 2. compile: compile model library with specification in mlc-chat-config.json

mlc_llm compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json --device android qwen2_1.8B-chat_q4f16_1 -o ./dist/libs/${MODEL_NAME}-${QUANTIZATION}-android.tar

This generates the dist/libs/Qwen1.5-1.8B-Chat-q4f16_1-android.tar file.

Step 4: Modify app-config.json File

Modify the contents of ./android/library/src/main/assets/app-config.json as follows:

{
  "model_list": [
    {
      "model_url": "https://huggingface.co/mlc-ai/Qwen-1_8B-Chat-q4f16_1-MLC/",
      "model_lib": "qwen_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen-1_8B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat",
      "model_lib": "qwen2_0.5B_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-0.5B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat",
      "model_lib": "qwen2_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/Qwen/Qwen1.5-4B-Chat",
      "model_lib": "qwen2_4B-chat_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-4B-Chat-q4f16_1"
    },
    {
      "model_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC",
      "model_lib": "gemma_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "gemma-2b-it-q4f16_1"
    }
  ],
  "model_lib_path_for_prepare_libs": {
    "qwen_q4f16_1": "libs/Qwen-1_8B-Chat-q4f16_1-android.tar",
    "qwen2_q4f16_1": "libs/Qwen1.5-1.8B-Chat-q4f16_1-android.tar",
    "qwen2_4B-chat_q4f16_1": "libs/Qwen1.5-4B-Chat-q4f16_1-android.tar",
    "gemma_q4f16_1": "libs/gemma-2b-it-q4f16_1-android.tar"
  }
}

Step 5: Bundle Model Library

# Bundle model library
cd ./android/library
./prepare_libs.sh

Step 6: Build Android App

# Build android app
cd .. && ./gradlew assembleDebug

Expected behavior

Environment

Additional context

Hzfengsy commented 5 months ago

Please specify --context-window-size for Qwen 1.5. BTW I just ran it a few days ago, it works 151711345458_ pic

MrRace commented 5 months ago

@Hzfengsy Which version of Qwen1.5 are you specifically using? Qwen1.5-0.5B-Chat? Or Qwen1.5-1.8B-Chat? Or Qwen1.5-4B-Chat?

Hzfengsy commented 5 months ago

4B

zw0241 commented 4 months ago

@Hzfengsy Iam facing the same issue. Is your issue resolved now? "[Qwen1.5-1.8B-Chat ] in mlc-llm, when click the chat entrance in, the model can be loaded normally, but after starting chatting with input text, it gets stuck, and then after a while, the entire application crashes?"