[Bug] M2 Build for any model - ValueError: Multiple weight shard files without json map is not supported

giovannizinzi commented 1 year ago

🐛 Bug

Trying to use rwkv-raven-1b5. It doesn't seem to load the model from HF, but instead throws an error at ("Multiple weight shard files without json map is not supported"). Any advice on how to troubleshoot?

Mac Sonoma 14.0 M2 Chip, installed TVM and mlc in a conda environment Followed instructions at https://llm.mlc.ai/docs/compilation/compile_models.html

To Reproduce

python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2

Weights exist at dist/models/rwkv-raven-1b5, skipping download. Using path "dist/models/rwkv-raven-1b5" for model "rwkv-raven-1b5" Host CPU dection: Target triple: arm64-apple-darwin23.0.0 Process triple: arm64-apple-darwin23.0.0 Host CPU: apple-m1 Target configured: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/build.py", line 46, in main() File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/build.py", line 42, in main core.build_model_from_args(parsed_args) File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/core.py", line 643, in build_model_from_args param_manager.init_torch_pname_to_bin_name(args.use_safetensors) File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/relax_model/param_manager.py", line 292, in init_torch_pname_to_bin_name mapping = load_torch_pname2binname_map( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/relax_model/param_manager.py", line 920, in load_torch_pname2binname_map raise ValueError("Multiple weight shard files without json map is not supported") ValueError: Multiple weight shard files without json map is not supported

Expected behavior

Would like the build to work.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Metal/IOS
Operating system (e.g. Ubuntu/Windows/MacOS/...): Mac Sonoma 14.0
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): m2 MacBook Pro
How you installed MLC-LLM (conda, source): yes
How you installed TVM-Unity (pip, source): yes
Python version (e.g. 3.10): 3.11
GPU driver version (if applicable): metal
CUDA/cuDNN version (if applicable): n/a
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: 62c05266986ea6639a9fd16fb87ba75a9ec056a8 USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2023-10-07 16:42:11 -0700 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 15.0.7 USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: ON USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /Library/Developer/CommandLineTools/usr/bin/c++ HIDE_PRIVATE_SYMBOLS: ON

junrushao commented 1 year ago

CC: @Hzfengsy

sunggg commented 1 year ago

Hi, @giovannizinzi. Would you check your model directory only contains safetensors? Without --use-safetensors, the build script would look for bin files and this might be the issue.

giovannizinzi commented 1 year ago

@sunggg interesting, the model directory I was trying to pull from in huggingface does not contain safe tensors (instead it's .bin shards): RWKVLink

1 point of feedback, I believe the following documentation is not up to date based on your comment:

I think I get the problem, so next going to try git lfs clone a model's weights locally to dist/models and then run the build commands, will report back if it works

giovannizinzi commented 1 year ago

Closing this issue. Thanks for the pointer @sunggg got it working.

I git lfs cloned the rwkv-raven-1b5 hf directory (https://huggingface.co/RWKV/rwkv-raven-1b5/tree/main) into dist/models

From there, I ran

python3 -m mlc_llm.build --model rwkv-raven-1b5 --target iphone --quantization q4f16_2

and everything worked as intended. Will just git clone things locally before compiling instead of using hf flag. Thx!

mlc-ai / mlc-llm