mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
17.84k stars 1.42k forks source link

[Question] Running mlc_llm into a multi-phase container build #2512

Open oglok opened 1 month ago

oglok commented 1 month ago

❓ General Questions

I'm trying to build a containerized application with vicuna-7b and mlc-llm for Jetsons with JP6. This is my multi-phase Containerfile:

FROM docker.io/dustynv/mlc:r36.2.0 AS builder

# Install and enable git-lfs
RUN apt-get update && apt-get install git-lfs
RUN git lfs install

# Clone local copy of Vicuna-7b-v1.5
RUN mkdir /opt/models
RUN cd /opt/models && git clone https://huggingface.co/lmsys/vicuna-7b-v1.5

# Install grpcio-tools and compile protobuf protocols
RUN pip install grpcio-tools
COPY ./protobuf /protobuf
RUN cd /protobuf/ && python3 -m grpc_tools.protoc -I./ --python_out=. --pyi_out=. --grpc_python_out=. ./vicunaserving.proto

# Compile vicuna-7b-v1.5
RUN python3 -m mlc_llm.build \
    --model vicuna-7b-v1.5 \
    --quantization q4f16_ft \
    --artifact-path /opt/ \
    --max-seq-len 4096 \
    --target cuda

### End builder image build ###

# Start main image build
FROM nvcr.io/nvidia/l4t-base:r36.2.0
# Copy vicunaserver,  compiled model, and whl files for pip install
COPY ./vicunaserver/ /opt/vicunaserver/
COPY --from=builder /opt/vicuna-7b-v1.5-q4f16_ft/ /opt/vicunaserver/
COPY --from=builder /opt/mlc_llm-0.1.dev930+g607dc5a-py3-none-any.whl /tmp
COPY --from=builder /opt/torch-2.1.0-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/torchvision-0.16.0+fbb4cc5-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/mlc_chat-0.1.dev930+g607dc5a-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/tvm-0.15.dev48+g59c355604-cp310-cp310-linux_aarch64.whl /tmp

# Install dependencies with apt/pip
RUN apt update && apt install -y python3-pip
RUN python3.10 -m pip install /tmp/*.whl

# Install essential CUDA packages
RUN apt-cache search cuda-*
RUN apt install -y --no-install-recommends --no-install-suggests cuda-minimal-build-12-2 cuda-nvrtc-12-2 libcudnn8 libcublas-12-2 libcurand-12-2

# Copy protobuf defs
COPY --from=builder /protobuf/ /opt/vicunaserver

# Set workdir to where our server is
WORKDIR /opt/vicunaserver/

# Set default CMD to run our server
CMD python3 vicunaserver.py

When I run the container, I get the following error:

root@402c73bca1f5:/opt/vicunaserver# python3 vicunaserver.py
Traceback (most recent call last):
  File "/opt/vicunaserver/vicunaserver.py", line 1, in <module>
    from mlc_chat import ChatModule, ChatConfig, ConvConfig
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/__init__.py", line 5, in <module>
    from . import protocol, serve
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/serve/__init__.py", line 4, in <module>
    from .. import base
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/base.py", line 6, in <module>
    import tvm
  File "/usr/local/lib/python3.10/dist-packages/tvm/__init__.py", line 26, in <module>
    from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/__init__.py", line 28, in <module>
    from .base import register_error
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 78, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 64, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libfpA_intB_gemm.so: cannot open shared object file: No such file or directory

I see that library is located in:

root@402c73bca1f5:/opt/vicunaserver# ll /usr/local/lib/python3.10/dist-packages/tvm/libfpA_intB_gemm.so
-rwxr-xr-x. 1 root root 190683216 Jun  5 09:04 /usr/local/lib/python3.10/dist-packages/tvm/libfpA_intB_gemm.so*

So If I add export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tvm/:$LD_LIBRARY_PATH, my vicuna server starts:

root@402c73bca1f5:/opt/vicunaserver# python3 vicunaserver.py
[2024-06-05 09:38:00] INFO vicunaserver.py:51: Starting server on [::]:50051

However, when I make use of it (the demo is basically a webserver, yolov8 and this vicuna server) I get an error which seems about some dependency mismatch:

root@vicunaserver-5ff46766b9-mznrs:/opt/vicunaserver# python3 vicunaserver.py
[2024-06-05 10:19:34] INFO vicunaserver.py:51: Starting server on [::]:50051
[2024-06-05 10:19:45] INFO vicunaserver.py:15: Serving the requested inferencing
[2024-06-05 10:19:47] INFO auto_device.py:76: Found device: cuda:0
[2024-06-05 10:19:49] INFO auto_device.py:85: Not found device: rocm:0
[2024-06-05 10:19:51] INFO auto_device.py:85: Not found device: metal:0
[2024-06-05 10:19:53] INFO auto_device.py:85: Not found device: vulkan:0
[2024-06-05 10:19:55] INFO auto_device.py:85: Not found device: opencl:0
[2024-06-05 10:19:55] INFO auto_device.py:33: Using device: cuda:0
[2024-06-05 10:19:55] INFO chat_module.py:373: Using model folder: /opt/vicunaserver/params
[2024-06-05 10:19:55] INFO chat_module.py:374: Using mlc chat config: /opt/vicunaserver/params/mlc-chat-config.json
[2024-06-05 10:19:55] INFO chat_module.py:560: Using library model: /opt/vicunaserver/vicuna-7b-v1.5-q4f16_ft-cuda.so
[2024-06-05 10:19:57] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 160, in main
    metadata = _extract_metadata(parsed.model_lib)
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/relax_vm.py", line 136, in __getitem__
    return self.module[key]
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 192, in __getitem__
    return self.get_function(name)
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 176, in get_function
    raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function '_metadata'

Any clue?

It seems to be complaining about this function: https://GitHub.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/cli/model_metadata.py#L20

oglok commented 1 month ago

BTW, I tried using dustynv's as base image instead of the l4t-base one, and I'm getting the same results :-(

tqchen commented 1 month ago

hi @oglok seems you were using an older version that is now being deprecated

oglok commented 1 month ago

Hey @tqchen , is there a container image I can use ?

oglok commented 1 month ago

I see dockerfiles in the mlc-ai/packages repo, and mlc-ai/env repo, but nothing ready to use...

tqchen commented 1 month ago

as of now unfortunately we don't have a container file unfortunately so maybe build from source for jetson is needed

oglok commented 1 month ago

@tqchen I realize there is no wheel package with cuda for ARM devices. Am I the only one interested in running this on a Jetson?

dusty-nv commented 1 month ago

@oglok jetson-containers builds MLC from source, there are some patches I apply (mostly to MLC/TVM 3rd-party submodules) so it's not on the latest, however there is version using mlc_chat builder (what I have tagged as version 0.1.1)

Also IIRC AttributeError: Module has no function '_metadata' is just a warning and the program typically continues on without issue after that. Does it halt for you or you have some other problem?

oglok commented 1 month ago

Actually you are right @dusty-nv . The container does not halt, but I thought it wasn't behaving properly. But apparently it is.

What's the problem with generating wheel packages for cuda on arm? or for jetson?

dusty-nv commented 1 month ago

@oglok the MLC/TVM wheels that jetson-containers builds are here: http://jetson.webredirect.org/jp6/cu122

It is a non-trivial build process with all the bells & whistles enabled, and as you have found there are extra dependencies and files you need to install.

oglok commented 1 month ago

@oglok the MLC/TVM wheels that jetson-containers builds are here: http://jetson.webredirect.org/jp6/cu122

It is a non-trivial build process with all the bells & whistles enabled, and as you have found there are extra dependencies and files you need to install.

❤️