I am trying to compile this model with mlc-llm. I seem to be getting the following error:
(myenv) aadarsh@AAD-HPLAP:~/src/mlc-llm$ python3 -m mlc_llm.build --hf-path pankajmathur/orca_mini_3b --target vulkan --quantization q4f16_1
Weights exist at dist/models/orca_mini_3b, skipping download.
Using path "dist/models/orca_mini_3b" for model "orca_mini_3b"
Target configured: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -supports_16bit_buffer=1 -supports_8bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -supports_storage_buffer_storage_class=1 -thread_warp_size=1
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Automatically using target for weight quantization: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -supports_16bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -thread_warp_size=1
Get old param: 0%| | 0/161 [00:00<?, ?tensors/sTraceback (most recent call last): | 0/267 [00:00<?, ?tensors/s]
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/aadarsh/src/mlc-llm/mlc_llm/build.py", line 46, in <module>
main()
File "/home/aadarsh/src/mlc-llm/mlc_llm/build.py", line 42, in main
core.build_model_from_args(parsed_args)
File "/home/aadarsh/src/mlc-llm/mlc_llm/core.py", line 648, in build_model_from_args
new_params = utils.convert_weights(param_manager, params, args)
File "/home/aadarsh/src/mlc-llm/mlc_llm/utils.py", line 271, in convert_weights
vm = relax.vm.VirtualMachine(ex, device)
File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/runtime/relax_vm.py", line 81, in __init__
rt_mod = rt_mod.jit()
File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/relax/vm_build.py", line 89, in jit
not_runnable_list = self.mod._collect_from_import_tree(_not_runnable)
File "/home/aadarsh/.local/lib/python3.8/site-packages/tvm/runtime/module.py", line 430, in _collect_from_import_tree
assert (
AssertionError: Module stackvm should be either dso exportable or binary serializable.
On searching, I found that it might be related to compiling TVM with LLVM, so I tried installing one of your pre-built wheels. But, that did not use LLVM, when I checked the build flags. So, I resorted to building it from source following the steps given here. But, I am still getting this error.
Please let me know what I need to do further, in order to compile the model.
Environment
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu (WSL2)
How you installed MLC-LLM (conda, source): Followed the steps mentioned here
How you installed TVM-Unity (pip, source): Built from source
Python version: 3.8
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: OFF
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: OFF
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 30b4fa3c13fc80d5c9151a9dc445d22c57ced3e0
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-10-17 21:33:54 -0700
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 17.0.3
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: none
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
🐛 Bug
I am trying to compile this model with mlc-llm. I seem to be getting the following error:
On searching, I found that it might be related to compiling TVM with LLVM, so I tried installing one of your pre-built wheels. But, that did not use LLVM, when I checked the build flags. So, I resorted to building it from source following the steps given here. But, I am still getting this error.
Please let me know what I need to do further, in order to compile the model.
Environment
conda
, source): Followed the steps mentioned herepip
, source): Built from sourcepython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):