mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.8k stars 1.53k forks source link

Looks amazing! Where's the code to compile dist/libs? Would like to try on Intel macOS #6

Closed TheBloke closed 1 year ago

TheBloke commented 1 year ago

I tried WebLLM the other week and was really blown away. I have an Intel macOS system with AMD 6900XT GPU and using WebLLM was the first time I'd had decent GPU inference on this system.

Now I'd love to try mlc-llm as well. I followed the instructions, but the pre-built Metal lib for macOS is built for ARM64/Silicon.

Where can I found the source for this so I can try compiling it myself?

junrushao commented 1 year ago

Hey thanks for bringing this up!

I actually built a package yesterday for Intel mac on conda. See the package named osx-64/mlc-chat-nightly there: https://anaconda.org/mlc-ai/mlc-chat-nightly/files.

As a green hand in conda build, I think it’s completely possible that I made some mistakes. Would you like to provide some further information to help me debug?

More specifically, would you love to provide the output of the following command:

conda create -n mlc-chat
conda activate mlc-chat
conda install -c mlc-ai -c conda-forge mlc-chat-nightly
TheBloke commented 1 year ago

Thanks for the fast reply!

I think the issue is not with mlc_chat_cli which as you say is x64 compiled. Rather it's vicuna-v1-7b_metal_float16.so which is ARM64 compiled.

Here's my log from everything I've run. I just ran it a second time to double check I'd not done anything wrong, hence I've made mlc-chat2.

(base) tomj@marvin:/Users/tomj/src/docker$ conda create -n mlc-chat2
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/tomj/anaconda3/envs/mlc-chat2

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate mlc-chat2
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) tomj@marvin:/Users/tomj/src/docker$ conda activate mlc-chat2
(mlc-chat2) tomj@marvin:/Users/tomj/src/docker$ cd ~/src/mlc-llm/

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ conda install git git-lfs
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/tomj/anaconda3/envs/mlc-chat2

  added / updated specs:
    - git
    - git-lfs

The following NEW packages will be INSTALLED:

  bzip2              pkgs/main/osx-64::bzip2-1.0.8-h1de35cc_0
  c-ares             pkgs/main/osx-64::c-ares-1.19.0-h6c40b1e_0
  ca-certificates    pkgs/main/osx-64::ca-certificates-2023.01.10-hecd8cb5_0
  curl               pkgs/main/osx-64::curl-7.88.1-h6c40b1e_0
  expat              pkgs/main/osx-64::expat-2.4.9-he9d5cce_0
  gdbm               pkgs/main/osx-64::gdbm-1.18-hdccc71a_4
  gettext            pkgs/main/osx-64::gettext-0.21.0-he85b6c0_1
  git                pkgs/main/osx-64::git-2.34.1-pl5262h74264fa_0
  git-lfs            pkgs/main/osx-64::git-lfs-2.13.3-hecd8cb5_0
  icu                pkgs/main/osx-64::icu-58.2-h0a44026_3
  krb5               pkgs/main/osx-64::krb5-1.19.4-hdba6334_0
  libcurl            pkgs/main/osx-64::libcurl-7.88.1-ha585b31_0
  libcxx             pkgs/main/osx-64::libcxx-14.0.6-h9765a3e_0
  libedit            pkgs/main/osx-64::libedit-3.1.20221030-h6c40b1e_0
  libev              pkgs/main/osx-64::libev-4.33-h9ed2024_1
  libiconv           pkgs/main/osx-64::libiconv-1.16-hca72f7f_2
  libnghttp2         pkgs/main/osx-64::libnghttp2-1.46.0-ha29bfda_0
  libssh2            pkgs/main/osx-64::libssh2-1.10.0-h0a4fc7d_0
  libxml2            pkgs/main/osx-64::libxml2-2.10.3-h930c0e2_0
  llvm-openmp        pkgs/main/osx-64::llvm-openmp-14.0.6-h0dcd299_0
  ncurses            pkgs/main/osx-64::ncurses-6.4-hcec6c5f_0
  openssl            pkgs/main/osx-64::openssl-1.1.1t-hca72f7f_0
  pcre2              pkgs/main/osx-64::pcre2-10.37-he7042d7_1
  perl               pkgs/main/osx-64::perl-5.34.0-h435f0c2_2
  readline           pkgs/main/osx-64::readline-8.2-hca72f7f_0
  tk                 pkgs/main/osx-64::tk-8.6.12-h5d9f67b_0
  xz                 pkgs/main/osx-64::xz-5.2.10-h6c40b1e_1
  zlib               pkgs/main/osx-64::zlib-1.2.13-h4dc903c_0

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ conda install -c mlc-ai -c conda-forge mlc-chat-nightly
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/tomj/anaconda3/envs/mlc-chat2

  added / updated specs:
    - mlc-chat-nightly

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mlc-chat-nightly-0.1.dev2  |2_g630b061_h1234567_0         3.5 MB  mlc-ai
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

  mlc-chat-nightly   mlc-ai/osx-64::mlc-chat-nightly-0.1.dev2-2_g630b061_h1234567_0

The following packages will be UPDATED:

  libcxx                pkgs/main::libcxx-14.0.6-h9765a3e_0 --> conda-forge::libcxx-16.0.2-hd57cbcb_0

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2022.12.7-h033912b_0
  openssl              pkgs/main::openssl-1.1.1t-hca72f7f_0 --> conda-forge::openssl-1.1.1t-hfd90126_0

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ rm -rf dist

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ mkdir dist

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git lfs install
Updated git hooks.
Git LFS initialized.

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b
Cloning into 'dist/vicuna-v1-7b'...
remote: Enumerating objects: 149, done.
remote: Counting objects: 100% (149/149), done.
remote: Compressing objects: 100% (148/148), done.
remote: Total 149 (delta 4), reused 133 (delta 0), pack-reused 0
Receiving objects: 100% (149/149), 25.17 KiB | 8.39 MiB/s, done.
Resolving deltas: 100% (4/4), done.
Filtering content: 100% (133/133), 2.85 GiB | 31.11 MiB/s, done.

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib
Cloning into 'dist/lib'...
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 11 (delta 4), reused 8 (delta 4), pack-reused 0
Receiving objects: 100% (11/11), 588.09 KiB | 2.56 MiB/s, done.
Resolving deltas: 100% (4/4), done.

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ ls -al  dist/lib
total 23240
drwxr-xr-x  7 tomj staff     224 Apr 29 21:00 .
drwxr-xr-x  4 tomj staff     128 Apr 29 21:00 ..
drwxr-xr-x 12 tomj staff     384 Apr 29 21:00 .git
-rw-r--r--  1 tomj staff      21 Apr 29 21:00 README.md
-rwxr-xr-x  1 tomj staff 7796170 Apr 29 21:00 vicuna-v1-7b_metal_float16.so
-rwxr-xr-x  1 tomj staff 8014336 Apr 29 21:00 vicuna-v1-7b_vulkan_float16.dll
-rwxr-xr-x  1 tomj staff 7978912 Apr 29 21:00 vicuna-v1-7b_vulkan_float16.so

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file  dist/lib/vicuna-v1-7b_metal_float16.so
dist/lib/vicuna-v1-7b_metal_float16.so: Mach-O 64-bit dynamically linked shared library arm64

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ mlc_chat_cli
Use lib /Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so
[21:01:16] /Users/runner/work/utils/utils/tvm/src/runtime/dso_library.cc:125:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (lib_handle_ != nullptr) is false: Failed to load dynamic shared library /Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so dlopen(/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so, 0x0005): tried: '/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64h' or 'x86_64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (no such file), '/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64h' or 'x86_64'))
Stack trace:
  [bt] (0) 1   libtvm_runtime.dylib                0x0000000108f3fc98 tvm::runtime::Backtrace() + 24
  [bt] (1) 2   libtvm_runtime.dylib                0x0000000108f0c929 tvm::runtime::detail::LogFatal::Entry::Finalize() + 89
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000108f0c8c9 tvm::runtime::detail::LogFatal::~LogFatal() + 25
  [bt] (3) 4   libtvm_runtime.dylib                0x0000000108f07159 tvm::runtime::detail::LogFatal::~LogFatal() + 9
  [bt] (4) 5   libtvm_runtime.dylib                0x0000000108f2db2d tvm::runtime::DSOLibrary::Load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 269
  [bt] (5) 6   libtvm_runtime.dylib                0x0000000108f2dd2f tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 175
  [bt] (6) 7   libtvm_runtime.dylib                0x0000000108f479b6 tvm::runtime::Module::LoadFromFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 598
  [bt] (7) 8   mlc_chat_cli                        0x0000000108c684c2 main + 8402
  [bt] (8) 9   dyld                                0x00007ff808ef341f start + 1903

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ which mlc_chat_cli
/Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ ls -al /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
-rwxrwxr-x 2 tomj staff 203288 Apr 29 19:00 /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file  /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
/Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli: Mach-O 64-bit executable x86_64

The key part for me is this:

(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file  dist/lib/vicuna-v1-7b_metal_float16.so
dist/lib/vicuna-v1-7b_metal_float16.so: Mach-O 64-bit dynamically linked shared library arm64

It's provided an ARM64 library which I won't be able to run. It fetches that from https://github.com/mlc-ai/binary-mlc-llm-libs.git but I couldn't find any source code associated with that lib so I wasn't sure if I could try re-compiling it for x64 (sorry if I missed it anywhere).

junrushao commented 1 year ago

Ah got it! The stack trace is super informative and helpful!

The issue comes from host code generation (the part that launches gpu kernels), we forgot to tweak LLVM parameters to generate x86 code - should be easy to fix. @tqchen and I are working on it!

TheBloke commented 1 year ago

That's great to hear, thank you so much!

junrushao commented 1 year ago

@TheBloke we just sent 2 fixes above and now it works on my intel macbook :-) will need extra ~30min to rebuild the conda pkg and we will validate again!

TheBloke commented 1 year ago

Wonderful, thanks!

junrushao commented 1 year ago

Just validated that the new conda packages should work on both x86 and arm macbooks!

intel macbook: image

arm macbook: image

@TheBloke we will have to upgrade to the latest mlc-chat-nightly package from conda and update the dist/lib directory to make it work :-)

TheBloke commented 1 year ago

It's working! I grabbed the new dylib from the commit and it's working beautifully! Thanks guys!

image

junrushao commented 1 year ago

Thanks for reporting this issue! Extremely helpful and timely for us to improve our packaging experience!

TheBloke commented 1 year ago

So cool!
image

Thanks for reporting this issue! Extremely helpful and timely for us to improve our packaging experience!

Not at all, thanks so much for the very quick response and the amazing code.

Will you guys be publishing the source code for the metal lib as well? Or is that already in another repo that I missed?

The next thing I'd love to try is expanding this to other models. I release lots of models over on HF (https://huggingface.co/TheBloke) and I'd love to look into making versions of those for mlc/WebLLM and publishing those on HF. Eg Vicuna 1.1 13B, WizardLM 7B, StableVicuna, OpenAssistant, etc

I saw the code over on WebLLM for building models. Is that the same code that's used to make the dist/vicuna shards we use here?

tqchen commented 1 year ago

The metallib is being generated through the TVM pipeline which makes it also universally optimize and map to the backend of interest, including vulkan and metal. So they are already in this repo . A lot of the current dispatchers are here https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/transform/dispatch_tir_operator.py#L3

Yes WebLLM and MLC-LLM will share most of the common code and shards are the same, we love to work with community to bring support for models, and enable more people to do it together. Check out https://mlc.ai/ for some of the background info.

junrushao commented 1 year ago

@TheBloke Hey just wanted to further clarify one thing: there are two levels of IRs in TVM Unity compiler, the graph-level one is called Relax, and the loop-level one is called TensorIR or TIR. Depending on how deep you want to customize, you may use only one or both:

publishing the source code for the metal lib as well

As you might already tell, they are open-sourced as TIR functions already, which means we didn't write a single line of Vulkan, CUDA or Metal at all - the metal lib is simply the binary artifact built by the TVM Unity compiler, which handles everything already!

TheBloke commented 1 year ago

Ahh I see, OK wonderful. Thanks very much for the detailed explanations, @tqchen and @junrushao!

That's a really cool system!

I will have a go at making my own model and let you know if I succeed.

Thanks again for the great support and for all the work you guys have done on this project. It really is a great idea and could be a game changer for everyone who doesn't have a NVidia GPU and is currently left out of easy GPU inference.