mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.81k stars 1.54k forks source link

[Bug] mlc_llm serve error on Mac M1 (git clone failed with error 128) #2938

Open pchalasani opened 1 week ago

pchalasani commented 1 week ago

🐛 Bug

see title

To Reproduce

Steps to reproduce the behavior:

  1. follow instructions to install mlc from nightly (NOT using conda, just in my venv)
  2. run mlc_llm serve HF://mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC

Expected behavior

should work

Environment

Additional context

error trace:

[2024-09-24 09:45:08] INFO auto_device.py:88: Not found device: cuda:0
[2024-09-24 09:45:09] INFO auto_device.py:88: Not found device: rocm:0
[2024-09-24 09:45:10] INFO auto_device.py:79: Found device: metal:0
[2024-09-24 09:45:10] INFO auto_device.py:88: Not found device: vulkan:0
[2024-09-24 09:45:11] INFO auto_device.py:88: Not found device: opencl:0
[2024-09-24 09:45:11] INFO auto_device.py:35: Using device: metal:0
[2024-09-24 09:45:11] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC
[2024-09-24 09:45:11] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-24 09:45:11] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC.git to /var/folders/dx/39xz0fk938zftc3djhbm78bc0000gn/T/tmpjpli117j/tmp
Traceback (most recent call last):
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/support/download_cache.py", line 57, in git_clone
    subprocess.run(
  File "/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'clone', 'https://huggingface.co/mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC.git', '.tmp']' returned non-zero exit status 128.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/pchalasani/Git/langroid/.venv/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/__main__.py", line 49, in main
    cli.main(sys.argv[2:])
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/cli/serve.py", line 204, in main
    serve(
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/interface/serve.py", line 55, in serve
    async_engine = engine.AsyncMLCEngine(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine.py", line 896, in __init__
    super().__init__(
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 171, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 125, in _convert_model_info
    model_path = download_cache.get_or_download_model(model.model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/support/download_cache.py", line 228, in get_or_download_model
    model_path = download_and_cache_mlc_weights(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/support/download_cache.py", line 180, in download_and_cache_mlc_weights
    git_clone(git_url, tmp_dir, ignore_lfs=True)
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/support/download_cache.py", line 70, in git_clone
    raise ValueError(
ValueError: Git clone failed with return code 128: None. The command was: ['git', 'clone', 'https://huggingface.co/mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC.git', '.tmp']
Exception ignored in: <function MLCEngineBase.__del__ at 0x127f94540>
Traceback (most recent call last):
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 654, in __del__
    self.terminate()
  File "/Users/pchalasani/Git/langroid/.venv/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 661, in terminate
    self._ffi["exit_background_loop"]()
    ^^^^^^^^^
AttributeError: 'AsyncMLCEngine' object has no attribute '_ffi'
shahizat commented 6 days ago

Hello, I am experiencing same issue on the NVIDIA Jetson AGX Orin 64GB Developer Kit.

[2024-09-29 17:09:43] INFO auto_device.py:79: Found device: cuda:0
[2024-09-29 17:09:45] INFO auto_device.py:88: Not found device: rocm:0
[2024-09-29 17:09:47] INFO auto_device.py:88: Not found device: metal:0
[2024-09-29 17:09:49] INFO auto_device.py:88: Not found device: vulkan:0
[2024-09-29 17:09:51] INFO auto_device.py:88: Not found device: opencl:0
[2024-09-29 17:09:51] INFO auto_device.py:35: Using device: cuda:0
[2024-09-29 17:09:51] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC
[2024-09-29 17:09:51] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-29 17:09:51] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git to /tmp/tmp39iovrkp/tmp
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 57, in git_clone
    subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'clone', 'https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git', '.tmp']' returned non-zero exit status 128.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 49, in main
    cli.main(sys.argv[2:])
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/serve.py", line 204, in main
    serve(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/serve.py", line 55, in serve
    async_engine = engine.AsyncMLCEngine(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine.py", line 896, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 171, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 125, in _convert_model_info
    model_path = download_cache.get_or_download_model(model.model)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 228, in get_or_download_model
    model_path = download_and_cache_mlc_weights(model)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 180, in download_and_cache_mlc_weights
    git_clone(git_url, tmp_dir, ignore_lfs=True)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/download_cache.py", line 70, in git_clone
    raise ValueError(
ValueError: Git clone failed with return code 128: None. The command was: ['git', 'clone', 'https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC.git', '.tmp']
Exception ignored in: <function MLCEngineBase.__del__ at 0xffff36b43130>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 654, in __del__
    self.terminate()
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 661, in terminate
    self._ffi["exit_background_loop"]()
AttributeError: 'AsyncMLCEngine' object has no attribute '_ffi'
rickzx commented 5 days ago

Hi @pchalasani @shahizat I was not able to reproduce the same error on my Mac. I suspect this is due to git configuration issue. Can you try directly running:

git clone https://huggingface.co/mlc-ai/Qwen2.5-32B-Instruct-q4f32_1-MLC.git

and see if that works?

pchalasani commented 5 days ago

@rickzx yes the git clone works. Can we use mlc_llm serve and directly point it to the local cloned model, rather than the HF... argument?

ptrkstr commented 4 days ago

@pchalasani you may be missing the dependencies in step 1 here

pchalasani commented 4 days ago

@pchalasani you may be missing the dependencies in step 1 here

Thanks but I'm not using it on iOS. Please let me know if these deps are needed for my scenario (I followed the docs precisely and this was not mentioned).

ptrkstr commented 4 days ago

Ahhh apologies, I received a similar error and installing lfs is what worked for me but sounds like not applicable to you.