Open pchalasani opened 1 week ago
Hello, I am experiencing same issue on the NVIDIA Jetson AGX Orin 64GB Developer Kit.
[2024-09-29 17:09:43] INFO auto_device.py:79: Found device: cuda:0
[2024-09-29 17:09:45] INFO auto_device.py:88: Not found device: rocm:0
[2024-09-29 17:09:47] INFO auto_device.py:88: Not found device: metal:0
[2024-09-29 17:09:49] INFO auto_device.py:88: Not found device: vulkan:0
[2024-09-29 17:09:51] INFO auto_device.py:88: Not found device: opencl:0
[2024-09-29 17:09:51] INFO auto_device.py:35: Using device: cuda:0
[2024-09-29 17:09:51] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC
[2024-09-29 17:09:51] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-29 17:09:51] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git to /tmp/tmp39iovrkp/tmp
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 57, in git_clone
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'clone', 'https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git', '.tmp']' returned non-zero exit status 128.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/mlc_llm", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 49, in main
cli.main(sys.argv[2:])
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/serve.py", line 204, in main
serve(
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/serve.py", line 55, in serve
async_engine = engine.AsyncMLCEngine(
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine.py", line 896, in __init__
super().__init__(
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 590, in __init__
) = _process_model_args(models, device, engine_config)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 171, in <listcomp>
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 125, in _convert_model_info
model_path = download_cache.get_or_download_model(model.model)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 228, in get_or_download_model
model_path = download_and_cache_mlc_weights(model)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 180, in download_and_cache_mlc_weights
git_clone(git_url, tmp_dir, ignore_lfs=True)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 70, in git_clone
raise ValueError(
ValueError: Git clone failed with return code 128: None. The command was: ['git', 'clone', 'https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git', '.tmp']
Exception ignored in: <function MLCEngineBase.__del__ at 0xffff36b43130>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 654, in __del__
self.terminate()
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 661, in terminate
self._ffi["exit_background_loop"]()
AttributeError: 'AsyncMLCEngine' object has no attribute '_ffi'
Hi @pchalasani @shahizat I was not able to reproduce the same error on my Mac. I suspect this is due to git configuration issue. Can you try directly running:
git clone https://huggingface.co/mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC.git
and see if that works?
@rickzx yes the git clone works. Can we use mlc_llm serve
and directly point it to the local cloned model, rather than the HF... argument?
@pchalasani you may be missing the dependencies in step 1 here
@pchalasani you may be missing the dependencies in step 1 here
Thanks but I'm not using it on iOS. Please let me know if these deps are needed for my scenario (I followed the docs precisely and this was not mentioned).
Ahhh apologies, I received a similar error and installing lfs is what worked for me but sounds like not applicable to you.
🐛 Bug
see title
To Reproduce
Steps to reproduce the behavior:
mlc_llm serve HF://mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC
Expected behavior
should work
Environment
Platform Mac M1 Max
Operating system - MacOS Sonoma 14.2.1
How you installed MLC-LLM (
conda
, source): Ran this with my venv activated (did NOT use conda, not sure if that matters)How you installed TVM-Unity (
pip
, source): did not install thisPython version (e.g. 3.10): 3.11
Additional context
error trace: