Closed giovannizinzi closed 1 year ago
CC: @Hzfengsy
Hi, @giovannizinzi. Would you check your model directory only contains safetensors? Without --use-safetensors
, the build script would look for bin files and this might be the issue.
@sunggg interesting, the model directory I was trying to pull from in huggingface does not contain safe tensors (instead it's .bin shards): RWKVLink
I think I get the problem, so next going to try git lfs clone a model's weights locally to dist/models and then run the build commands, will report back if it works
Closing this issue. Thanks for the pointer @sunggg got it working.
I git lfs cloned the rwkv-raven-1b5 hf directory (https://huggingface.co/RWKV/rwkv-raven-1b5/tree/main) into dist/models
From there, I ran
python3 -m mlc_llm.build --model rwkv-raven-1b5 --target iphone --quantization q4f16_2
and everything worked as intended. Will just git clone things locally before compiling instead of using hf flag. Thx!
🐛 Bug
Trying to use rwkv-raven-1b5. It doesn't seem to load the model from HF, but instead throws an error at ("Multiple weight shard files without json map is not supported"). Any advice on how to troubleshoot?
Mac Sonoma 14.0 M2 Chip, installed TVM and mlc in a conda environment Followed instructions at https://llm.mlc.ai/docs/compilation/compile_models.html
To Reproduce
Weights exist at dist/models/rwkv-raven-1b5, skipping download. Using path "dist/models/rwkv-raven-1b5" for model "rwkv-raven-1b5" Host CPU dection: Target triple: arm64-apple-darwin23.0.0 Process triple: arm64-apple-darwin23.0.0 Host CPU: apple-m1 Target configured: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 Traceback (most recent call last): File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/build.py", line 46, in
main()
File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/build.py", line 42, in main
core.build_model_from_args(parsed_args)
File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/core.py", line 643, in build_model_from_args
param_manager.init_torch_pname_to_bin_name(args.use_safetensors)
File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/relax_model/param_manager.py", line 292, in init_torch_pname_to_bin_name
mapping = load_torch_pname2binname_map(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giovannizinzi/Documents/localModel/mlc-llm/mlc_llm/relax_model/param_manager.py", line 920, in load_torch_pname2binname_map
raise ValueError("Multiple weight shard files without json map is not supported")
ValueError: Multiple weight shard files without json map is not supported
Expected behavior
Would like the build to work.
Environment
conda
, source): yespip
, source): yespython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):