mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.08k stars 1.56k forks source link

[Bug] Compiling android model of Llama-2-7b-chat-hf on Windows #1079

Closed ashmon closed 11 months ago

ashmon commented 1 year ago

🐛 Bug

Error encountered in latest when compiling android model of Llama-2-7b-chat-hf on Windows.

python -m mlc_llm.build --target android --max-seq-len 768 --model dist/models/Llama-2-7b-chat-hf --quantization q4f16_1

Target configured: opencl -keys=opencl,gpu -max_function_args=128 -max_num_threads=256 -max_shared_memory_per_block=16384 -max_threads_per_block=256 -texture_spatial_limit=16384 -thread_warp_size=1
[23:26:03] D:\a\package\package\tvm\src\node\reflection.cc:109: AttributeError: relax.expr.Var object has no attributed shard_dim
Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.

[23:26:03] D:\a\package\package\tvm\src\node\reflection.cc:109: AttributeError: relax.expr.Var object has no attributed shard_strategy
Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.

Those two attrib errors repeat roughly 100 times then finally the index out of bounds hits. 

[23:26:05] D:\a\package\package\tvm\src\relax\ir\expr.cc:174: Check failed: index < tuple_info->fields.size() (197 vs. 197) : Index out of bounds: Tuple params is of size 197, and cannot be accessed with index 197
Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\UserName\project\mlc-llm\mlc_llm\build.py", line 46, in <module>
    main()
  File "c:\Users\UserName\project\mlc-llm\mlc_llm\build.py", line 42, in main
    core.build_model_from_args(parsed_args)
  File "c:\Users\UserName\project\mlc-llm\mlc_llm\core.py", line 645, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\UserName\project\mlc-llm\mlc_llm\utils.py", line 229, in convert_weights
    mod_transform = relax.transform.LazyTransformParams()(mod_transform)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\UserName\anaconda3\envs\project\Lib\site-packages\tvm\ir\transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\UserName\anaconda3\envs\project\Lib\site-packages\tvm\_ffi\_ctypes\packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "C:\Users\UserName\anaconda3\envs\project\Lib\site-packages\tvm\_ffi\base.py", line 415, in raise_last_ffi_error
    _LIB.TVMGetLastPythonError.restype = ctypes.c_void_p
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\UserName\anaconda3\envs\project\Lib\ctypes\__init__.py", line 389, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\UserName\anaconda3\envs\project\Lib\ctypes\__init__.py", line 394, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: function 'TVMGetLastPythonError' not found. Did you mean: 'TVMAPISetLastPythonError'?

Steps to reproduce the behavior:

The is following the outlined Android App instructions via https://llm.mlc.ai/docs/deploy/android.html

Environment

Windows 10 10.0.19045, building for Android Rust installed and exposed to PATH. Android studio installed and configured. OpenJDK installed All env vars configured and independently verified.

The test basis now tested on two different computers with similar but different spec. fresh conda env py3.11 mlc-llm installed via pip with --recursive. tvm installed via pip python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly zstd.dll downloaded and coped to conda env/site-packages/tvm/ Llama-2-7b-chat-hf pulled via git. into correct dist/models/Llama-2-7b-chat-hf subfolder.

To verify TVM here is the following support info. python -c "import tvm; print(tvm.file)" C:\Users\UserName\anaconda3\envs\project\Lib\site-packages\tvm__init__.py

python -c "import tvm; print(tvm._ffi.base._LIB)" <CDLL 'C:\Users\UserName\anaconda3\envs\project\Lib\site-packages\tvm\tvm.dll', handle 7ffd3a1b0000 at 0x2275b9db7d0>

(project) c:\Users\UserName\project\mlc-llm>python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"

USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: 62c05266986ea6639a9fd16fb87ba75a9ec056a8 USE_VULKAN: ON USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2023-10-07 16:42:11 -0700 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 17.0.2 USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.35.32215/bin/HostX64/x64/cl.exe HIDE_PRIVATE_SYMBOLS: OFF

Let me know if there is any other info I can gather for you. -Cort

junrushao commented 1 year ago

I noticed three issues with your report:

One is, which seems related to sharding:

[23:26:03] D:\a\package\package\tvm\src\node\reflection.cc:109: AttributeError: relax.expr.Var object has no attributed shard_dim
[23:26:03] D:\a\package\package\tvm\src\node\reflection.cc:109: AttributeError: relax.expr.Var object has no attributed shard_strategy

And the second one is:

[23:26:05] D:\a\package\package\tvm\src\relax\ir\expr.cc:174: Check failed: index < tuple_info->fields.size() (197 vs. 197) : Index out of bounds: Tuple params is of size 197, and cannot be accessed with index 197

The last one seems to related to some latest change from upstream TVM (CC the author @Lunderberg if you'd love to take a look):

AttributeError: function 'TVMGetLastPythonError' not found. Did you mean: 'TVMAPISetLastPythonError'?

Would you like to provide the full python stack trace of the first two issues? Particularly the first one will be helpful to me :)

junrushao commented 1 year ago

Related: https://github.com/mlc-ai/mlc-llm/issues/1112

junrushao commented 1 year ago

Update: https://github.com/mlc-ai/mlc-llm/issues/1112#issuecomment-1776503604

junrushao commented 11 months ago

Fixed