mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.08k stars 1.56k forks source link

[Bug] Compiling a custom model not working #1104

Closed aadarsh-ram closed 1 year ago

aadarsh-ram commented 1 year ago

🐛 Bug

I am trying to compile this model with mlc-llm. I seem to be getting the following error:

(myenv) aadarsh@AAD-HPLAP:~/src/mlc-llm$ python3 -m mlc_llm.build --hf-path pankajmathur/orca_mini_3b --target vulkan --quantization q4f16_1
Weights exist at dist/models/orca_mini_3b, skipping download.
Using path "dist/models/orca_mini_3b" for model "orca_mini_3b"
Target configured: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -supports_16bit_buffer=1 -supports_8bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -supports_storage_buffer_storage_class=1 -thread_warp_size=1
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Automatically using target for weight quantization: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -supports_16bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -thread_warp_size=1
Get old param:   0%|                                                                       | 0/161 [00:00<?, ?tensors/sTraceback (most recent call last):                                                          | 0/267 [00:00<?, ?tensors/s]
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/aadarsh/src/mlc-llm/mlc_llm/build.py", line 46, in <module>
    main()
  File "/home/aadarsh/src/mlc-llm/mlc_llm/build.py", line 42, in main
    core.build_model_from_args(parsed_args)
  File "/home/aadarsh/src/mlc-llm/mlc_llm/core.py", line 648, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
  File "/home/aadarsh/src/mlc-llm/mlc_llm/utils.py", line 271, in convert_weights
    vm = relax.vm.VirtualMachine(ex, device)
  File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/runtime/relax_vm.py", line 81, in __init__
    rt_mod = rt_mod.jit()
  File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/relax/vm_build.py", line 89, in jit
    not_runnable_list = self.mod._collect_from_import_tree(_not_runnable)
  File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/runtime/module.py", line 430, in _collect_from_import_tree
    assert (
AssertionError: Module stackvm should be either dso exportable or binary serializable.

On searching, I found that it might be related to compiling TVM with LLVM, so I tried installing one of your pre-built wheels. But, that did not use LLVM, when I checked the build flags. So, I resorted to building it from source following the steps given here. But, I am still getting this error.

Please let me know what I need to do further, in order to compile the model.

Environment

aadarsh-ram commented 1 year ago

I think I found my error. Installing tvm-unity via environment variable worked for me. The steps are mentioned here. Thanks!