mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.14k stars 1.57k forks source link

Prebuilt StableLM 1.6B model compilation not working #2283

Closed saurav-pwh-old closed 5 months ago

saurav-pwh-old commented 6 months ago

πŸ› Bug

I am trying to work with StableLM 1.6b model.But getting error in model compilation step.

To Reproduce

Steps to reproduce the behavior:

  1. Library Installation:
    !python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122
    !git lfs install
    !mkdir -p dist
    !git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs

    1.Installing the model and compiling it:

    !cd dist && git clone https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f32_1-MLC
    !mkdir ./dist/libs
    !mlc_llm compile /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json \
    --device cuda -o /content/dist/libs/stablelm-2-zephyr-1_6b-q4f32_1-cuda.so

    Error

    Cloning into 'stablelm-2-zephyr-1_6b-q4f32_1-MLC'...
    remote: Enumerating objects: 47, done.
    remote: Counting objects: 100% (44/44), done.
    remote: Compressing objects: 100% (44/44), done.
    remote: Total 47 (delta 5), reused 0 (delta 0), pack-reused 3 (from 1)
    Unpacking objects: 100% (47/47), 2.38 MiB | 4.56 MiB/s, done.
    Filtering content: 100% (27/27), 882.66 MiB | 85.51 MiB/s, done.
    mkdir: cannot create directory β€˜./dist/libs’: File exists
    [2024-05-06 13:36:58] INFO auto_config.py:69: Found model configuration: /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json
    [2024-05-06 13:37:02] INFO auto_device.py:79: Found device: cuda:0
    [2024-05-06 13:37:02] INFO auto_target.py:71: Found configuration of target device "cuda:0": {"thread_warp_size": 32, "arch": "sm_75", "max_threads_per_block": 1024, "max_num_threads": 1024, "kind": "cuda", "max_shared_memory_per_block": 49152, "tag": "", "keys": ["cuda", "gpu"]}
    [2024-05-06 13:37:02] INFO auto_target.py:103: Found host LLVM triple: x86_64-redhat-linux-gnu
    [2024-05-06 13:37:02] INFO auto_target.py:104: Found host LLVM CPU: skylake-avx512
    [2024-05-06 13:37:02] INFO auto_target.py:317: Generating code for CUDA architecture: sm_75
    [2024-05-06 13:37:02] INFO auto_target.py:318: To produce multi-arch fatbin, set environment variable MLC_MULTI_ARCH. Example: MLC_MULTI_ARCH=70,72,75,80,86,87,89,90a
    [2024-05-06 13:37:02] INFO auto_config.py:153: Found model type: stablelm_epoch. Use `--model-type` to override.
    Traceback (most recent call last):
    File "/usr/local/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
    File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 25, in main
    cli.main(sys.argv[2:])
    File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/compile.py", line 120, in main
    parsed.model_type = detect_model_type(parsed.model_type, parsed.model)
    File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/auto_config.py", line 155, in detect_model_type
    raise ValueError(f"Unknown model type: {model_type}. Available ones: {list(MODELS.keys())}")
    ValueError: Unknown model type: stablelm_epoch. Available ones: ['llama', 'mistral', 'gemma', 'gpt2', 'mixtral', 'gpt_neox', 'gpt_bigcode', 'phi-msft', 'phi', 'qwen', 'qwen2', 'stablelm', 'baichuan', 'internlm', 'rwkv5', 'orion', 'llava', 'rwkv6', 'chatglm', 'eagle']

Expected behavior

A file named stablelm-2-zephyr-1_6b-q4f32_1-cuda.so should be successfully created inside libs directory.

Environment

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================|`

Additional context

I am using google colab to do all of this.

tqchen commented 6 months ago

@tlopex can you look a bit into this model?

tlopex commented 6 months ago

@saurav-pwh-old Hi, I think that you can temporarily try to use --model-type stablelm in your command to override it.

ollmer commented 5 months ago

Hi, I have the same issue. When I tried to use --model-type stablelm, I got a new error:

TypeError: StableLmConfig.__init__() missing 2 required positional arguments: 'layer_norm_eps' and 'partial_rotary_factor'

tlopex commented 5 months ago

@ollmer Thanks you for pointing that out! The reason is that official stablelm2 model has been updated and there are some differences in parameters. What we have in huggingface.co/mlc-ai may be obsolete. We will soon update new one.

tlopex commented 5 months ago

Hello, everyone! Sorry for so long waiting. Thanks to @MasterJH5574 's help, I already uploaded stablelm2_1.6b models below: 图片 And I tested it here

(mlc-prebuilt) tlopex@tlopex-OMEN-by-HP-Laptop-17-ck1xxx:~/mlc-llm$ python -m  mlc_llm chat HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC     --device "cuda:0"     --overrides context_window_size=4096     --opt "O2" 
[2024-05-24 18:40:07] INFO config.py:106: Overriding context_window_size from None to 4096
[2024-05-24 18:40:09] INFO auto_device.py:79: Found device: cuda:0
[2024-05-24 18:40:09] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:40:09] INFO download.py:42: [Git] Cloning https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC.git to /tmp/tmpq4lci24o/tmp
[2024-05-24 18:40:12] INFO download.py:78: [Git LFS] Downloading 0 files with Git LFS: []
0it [00:00, ?it/s]
[2024-05-24 18:40:18] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_3.bin to /tmp/tmpq4lci24o/tmp/params_shard_3.bin
[2024-05-24 18:40:21] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_2.bin to /tmp/tmpq4lci24o/tmp/params_shard_2.bin
[2024-05-24 18:40:26] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_4.bin to /tmp/tmpq4lci24o/tmp/params_shard_4.bin
[2024-05-24 18:40:32] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_0.bin to /tmp/tmpq4lci24o/tmp/params_shard_0.bin
[2024-05-24 18:40:33] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_6.bin to /tmp/tmpq4lci24o/tmp/params_shard_6.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_1.bin to /tmp/tmpq4lci24o/tmp/params_shard_1.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_5.bin to /tmp/tmpq4lci24o/tmp/params_shard_5.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_7.bin to /tmp/tmpq4lci24o/tmp/params_shard_7.bin
[2024-05-24 18:40:38] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_8.bin to /tmp/tmpq4lci24o/tmp/params_shard_8.bin
[2024-05-24 18:40:40] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_9.bin to /tmp/tmpq4lci24o/tmp/params_shard_9.bin
[2024-05-24 18:40:41] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_10.bin to /tmp/tmpq4lci24o/tmp/params_shard_10.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_12.bin to /tmp/tmpq4lci24o/tmp/params_shard_12.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_13.bin to /tmp/tmpq4lci24o/tmp/params_shard_13.bin
[2024-05-24 18:40:48] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_11.bin to /tmp/tmpq4lci24o/tmp/params_shard_11.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_15.bin to /tmp/tmpq4lci24o/tmp/params_shard_15.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_14.bin to /tmp/tmpq4lci24o/tmp/params_shard_14.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_16.bin to /tmp/tmpq4lci24o/tmp/params_shard_16.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_17.bin to /tmp/tmpq4lci24o/tmp/params_shard_17.bin
[2024-05-24 18:40:56] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_18.bin to /tmp/tmpq4lci24o/tmp/params_shard_18.bin
[2024-05-24 18:40:57] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_20.bin to /tmp/tmpq4lci24o/tmp/params_shard_20.bin
[2024-05-24 18:40:58] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_21.bin to /tmp/tmpq4lci24o/tmp/params_shard_21.bin
[2024-05-24 18:41:00] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_22.bin to /tmp/tmpq4lci24o/tmp/params_shard_22.bin
[2024-05-24 18:41:01] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_23.bin to /tmp/tmpq4lci24o/tmp/params_shard_23.bin
[2024-05-24 18:41:03] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_24.bin to /tmp/tmpq4lci24o/tmp/params_shard_24.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_26.bin to /tmp/tmpq4lci24o/tmp/params_shard_26.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_25.bin to /tmp/tmpq4lci24o/tmp/params_shard_25.bin
[2024-05-24 18:41:08] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_19.bin to /tmp/tmpq4lci24o/tmp/params_shard_19.bin
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 27/27 [00:55<00:00,  2.05s/it]
[2024-05-24 18:41:08] INFO download.py:155: Moving /tmp/tmpq4lci24o/tmp to /home/tlopex/.cache/mlc_llm/model_weights/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:41:08] INFO chat_module.py:781: Now compiling model lib on device...
[2024-05-24 18:41:08] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-05-24 18:41:08] INFO jit.py:160: Using cached model lib: /home/tlopex/.cache/mlc_llm/model_lib/489dc4831dc725c82bd025a54da84013.so
[2024-05-24 18:41:09] INFO model_metadata.py:96: Total memory usage: 1756.66 MB (Parameters: 882.66 MB. KVCache: 0.00 MB. Temporary buffer: 874.00 MB)
[2024-05-24 18:41:09] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /set [overrides]    override settings in the generation config. For example,
                      `/set temperature=0.5;max_gen_len=100;stop=end,stop`
                      Note: Separate stop words in the `stop` option with commas (,).
  Multi-line input: Use escape+enter to start a new line.

<|user|>: Hello!
<|assistant|>: 
Hello! How can I assist you today?

So I believe you all can use it as well. Enjoy trying it!

tqchen commented 5 months ago

Thanks @tlopex !