Closed TheBloke closed 1 year ago
Hey thanks for bringing this up!
I actually built a package yesterday for Intel mac on conda. See the package named osx-64/mlc-chat-nightly
there: https://anaconda.org/mlc-ai/mlc-chat-nightly/files.
As a green hand in conda build, I think it’s completely possible that I made some mistakes. Would you like to provide some further information to help me debug?
More specifically, would you love to provide the output of the following command:
conda create -n mlc-chat
conda activate mlc-chat
conda install -c mlc-ai -c conda-forge mlc-chat-nightly
Thanks for the fast reply!
I think the issue is not with mlc_chat_cli
which as you say is x64 compiled. Rather it's vicuna-v1-7b_metal_float16.so
which is ARM64 compiled.
Here's my log from everything I've run. I just ran it a second time to double check I'd not done anything wrong, hence I've made mlc-chat2.
(base) tomj@marvin:/Users/tomj/src/docker$ conda create -n mlc-chat2
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/tomj/anaconda3/envs/mlc-chat2
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate mlc-chat2
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) tomj@marvin:/Users/tomj/src/docker$ conda activate mlc-chat2
(mlc-chat2) tomj@marvin:/Users/tomj/src/docker$ cd ~/src/mlc-llm/
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ conda install git git-lfs
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/tomj/anaconda3/envs/mlc-chat2
added / updated specs:
- git
- git-lfs
The following NEW packages will be INSTALLED:
bzip2 pkgs/main/osx-64::bzip2-1.0.8-h1de35cc_0
c-ares pkgs/main/osx-64::c-ares-1.19.0-h6c40b1e_0
ca-certificates pkgs/main/osx-64::ca-certificates-2023.01.10-hecd8cb5_0
curl pkgs/main/osx-64::curl-7.88.1-h6c40b1e_0
expat pkgs/main/osx-64::expat-2.4.9-he9d5cce_0
gdbm pkgs/main/osx-64::gdbm-1.18-hdccc71a_4
gettext pkgs/main/osx-64::gettext-0.21.0-he85b6c0_1
git pkgs/main/osx-64::git-2.34.1-pl5262h74264fa_0
git-lfs pkgs/main/osx-64::git-lfs-2.13.3-hecd8cb5_0
icu pkgs/main/osx-64::icu-58.2-h0a44026_3
krb5 pkgs/main/osx-64::krb5-1.19.4-hdba6334_0
libcurl pkgs/main/osx-64::libcurl-7.88.1-ha585b31_0
libcxx pkgs/main/osx-64::libcxx-14.0.6-h9765a3e_0
libedit pkgs/main/osx-64::libedit-3.1.20221030-h6c40b1e_0
libev pkgs/main/osx-64::libev-4.33-h9ed2024_1
libiconv pkgs/main/osx-64::libiconv-1.16-hca72f7f_2
libnghttp2 pkgs/main/osx-64::libnghttp2-1.46.0-ha29bfda_0
libssh2 pkgs/main/osx-64::libssh2-1.10.0-h0a4fc7d_0
libxml2 pkgs/main/osx-64::libxml2-2.10.3-h930c0e2_0
llvm-openmp pkgs/main/osx-64::llvm-openmp-14.0.6-h0dcd299_0
ncurses pkgs/main/osx-64::ncurses-6.4-hcec6c5f_0
openssl pkgs/main/osx-64::openssl-1.1.1t-hca72f7f_0
pcre2 pkgs/main/osx-64::pcre2-10.37-he7042d7_1
perl pkgs/main/osx-64::perl-5.34.0-h435f0c2_2
readline pkgs/main/osx-64::readline-8.2-hca72f7f_0
tk pkgs/main/osx-64::tk-8.6.12-h5d9f67b_0
xz pkgs/main/osx-64::xz-5.2.10-h6c40b1e_1
zlib pkgs/main/osx-64::zlib-1.2.13-h4dc903c_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ conda install -c mlc-ai -c conda-forge mlc-chat-nightly
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/tomj/anaconda3/envs/mlc-chat2
added / updated specs:
- mlc-chat-nightly
The following packages will be downloaded:
package | build
---------------------------|-----------------
mlc-chat-nightly-0.1.dev2 |2_g630b061_h1234567_0 3.5 MB mlc-ai
------------------------------------------------------------
Total: 3.5 MB
The following NEW packages will be INSTALLED:
mlc-chat-nightly mlc-ai/osx-64::mlc-chat-nightly-0.1.dev2-2_g630b061_h1234567_0
The following packages will be UPDATED:
libcxx pkgs/main::libcxx-14.0.6-h9765a3e_0 --> conda-forge::libcxx-16.0.2-hd57cbcb_0
The following packages will be SUPERSEDED by a higher-priority channel:
ca-certificates pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2022.12.7-h033912b_0
openssl pkgs/main::openssl-1.1.1t-hca72f7f_0 --> conda-forge::openssl-1.1.1t-hfd90126_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ rm -rf dist
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ mkdir dist
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git lfs install
Updated git hooks.
Git LFS initialized.
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b
Cloning into 'dist/vicuna-v1-7b'...
remote: Enumerating objects: 149, done.
remote: Counting objects: 100% (149/149), done.
remote: Compressing objects: 100% (148/148), done.
remote: Total 149 (delta 4), reused 133 (delta 0), pack-reused 0
Receiving objects: 100% (149/149), 25.17 KiB | 8.39 MiB/s, done.
Resolving deltas: 100% (4/4), done.
Filtering content: 100% (133/133), 2.85 GiB | 31.11 MiB/s, done.
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib
Cloning into 'dist/lib'...
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 11 (delta 4), reused 8 (delta 4), pack-reused 0
Receiving objects: 100% (11/11), 588.09 KiB | 2.56 MiB/s, done.
Resolving deltas: 100% (4/4), done.
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ ls -al dist/lib
total 23240
drwxr-xr-x 7 tomj staff 224 Apr 29 21:00 .
drwxr-xr-x 4 tomj staff 128 Apr 29 21:00 ..
drwxr-xr-x 12 tomj staff 384 Apr 29 21:00 .git
-rw-r--r-- 1 tomj staff 21 Apr 29 21:00 README.md
-rwxr-xr-x 1 tomj staff 7796170 Apr 29 21:00 vicuna-v1-7b_metal_float16.so
-rwxr-xr-x 1 tomj staff 8014336 Apr 29 21:00 vicuna-v1-7b_vulkan_float16.dll
-rwxr-xr-x 1 tomj staff 7978912 Apr 29 21:00 vicuna-v1-7b_vulkan_float16.so
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file dist/lib/vicuna-v1-7b_metal_float16.so
dist/lib/vicuna-v1-7b_metal_float16.so: Mach-O 64-bit dynamically linked shared library arm64
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ mlc_chat_cli
Use lib /Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so
[21:01:16] /Users/runner/work/utils/utils/tvm/src/runtime/dso_library.cc:125:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (lib_handle_ != nullptr) is false: Failed to load dynamic shared library /Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so dlopen(/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so, 0x0005): tried: '/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64h' or 'x86_64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (no such file), '/Users/tomj/src/mlc-llm/dist/lib/vicuna-v1-7b_metal_float16.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64h' or 'x86_64'))
Stack trace:
[bt] (0) 1 libtvm_runtime.dylib 0x0000000108f3fc98 tvm::runtime::Backtrace() + 24
[bt] (1) 2 libtvm_runtime.dylib 0x0000000108f0c929 tvm::runtime::detail::LogFatal::Entry::Finalize() + 89
[bt] (2) 3 libtvm_runtime.dylib 0x0000000108f0c8c9 tvm::runtime::detail::LogFatal::~LogFatal() + 25
[bt] (3) 4 libtvm_runtime.dylib 0x0000000108f07159 tvm::runtime::detail::LogFatal::~LogFatal() + 9
[bt] (4) 5 libtvm_runtime.dylib 0x0000000108f2db2d tvm::runtime::DSOLibrary::Load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 269
[bt] (5) 6 libtvm_runtime.dylib 0x0000000108f2dd2f tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 175
[bt] (6) 7 libtvm_runtime.dylib 0x0000000108f479b6 tvm::runtime::Module::LoadFromFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 598
[bt] (7) 8 mlc_chat_cli 0x0000000108c684c2 main + 8402
[bt] (8) 9 dyld 0x00007ff808ef341f start + 1903
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ which mlc_chat_cli
/Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ ls -al /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
-rwxrwxr-x 2 tomj staff 203288 Apr 29 19:00 /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file /Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli
/Users/tomj/anaconda3/envs/mlc-chat2/bin/mlc_chat_cli: Mach-O 64-bit executable x86_64
The key part for me is this:
(mlc-chat2) tomj@marvin:/Users/tomj/src/mlc-llm$ file dist/lib/vicuna-v1-7b_metal_float16.so
dist/lib/vicuna-v1-7b_metal_float16.so: Mach-O 64-bit dynamically linked shared library arm64
It's provided an ARM64 library which I won't be able to run. It fetches that from https://github.com/mlc-ai/binary-mlc-llm-libs.git but I couldn't find any source code associated with that lib so I wasn't sure if I could try re-compiling it for x64 (sorry if I missed it anywhere).
Ah got it! The stack trace is super informative and helpful!
The issue comes from host code generation (the part that launches gpu kernels), we forgot to tweak LLVM parameters to generate x86 code - should be easy to fix. @tqchen and I are working on it!
That's great to hear, thank you so much!
@TheBloke we just sent 2 fixes above and now it works on my intel macbook :-) will need extra ~30min to rebuild the conda pkg and we will validate again!
Wonderful, thanks!
Just validated that the new conda packages should work on both x86 and arm macbooks!
intel macbook:
arm macbook:
@TheBloke we will have to upgrade to the latest mlc-chat-nightly
package from conda and update the dist/lib
directory to make it work :-)
It's working! I grabbed the new dylib from the commit and it's working beautifully! Thanks guys!
Thanks for reporting this issue! Extremely helpful and timely for us to improve our packaging experience!
So cool!
Thanks for reporting this issue! Extremely helpful and timely for us to improve our packaging experience!
Not at all, thanks so much for the very quick response and the amazing code.
Will you guys be publishing the source code for the metal lib as well? Or is that already in another repo that I missed?
The next thing I'd love to try is expanding this to other models. I release lots of models over on HF (https://huggingface.co/TheBloke) and I'd love to look into making versions of those for mlc/WebLLM and publishing those on HF. Eg Vicuna 1.1 13B, WizardLM 7B, StableVicuna, OpenAssistant, etc
I saw the code over on WebLLM for building models. Is that the same code that's used to make the dist/vicuna shards we use here?
The metallib is being generated through the TVM pipeline which makes it also universally optimize and map to the backend of interest, including vulkan and metal. So they are already in this repo . A lot of the current dispatchers are here https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/transform/dispatch_tir_operator.py#L3
Yes WebLLM and MLC-LLM will share most of the common code and shards are the same, we love to work with community to bring support for models, and enable more people to do it together. Check out https://mlc.ai/ for some of the background info.
@TheBloke Hey just wanted to further clarify one thing: there are two levels of IRs in TVM Unity compiler, the graph-level one is called Relax, and the loop-level one is called TensorIR or TIR. Depending on how deep you want to customize, you may use only one or both:
nn.Module
APIs that are similar to PyTorch, so it won't be too challenging if you know how to build stuff there. Warning: the APIs are still quite sloppy at the moment, wanted to refactor a bit further a few weeks later and merge them back to TVM Unity repo.publishing the source code for the metal lib as well
As you might already tell, they are open-sourced as TIR functions already, which means we didn't write a single line of Vulkan, CUDA or Metal at all - the metal lib is simply the binary artifact built by the TVM Unity compiler, which handles everything already!
Ahh I see, OK wonderful. Thanks very much for the detailed explanations, @tqchen and @junrushao!
That's a really cool system!
I will have a go at making my own model and let you know if I succeed.
Thanks again for the great support and for all the work you guys have done on this project. It really is a great idea and could be a game changer for everyone who doesn't have a NVidia GPU and is currently left out of easy GPU inference.
I tried WebLLM the other week and was really blown away. I have an Intel macOS system with AMD 6900XT GPU and using WebLLM was the first time I'd had decent GPU inference on this system.
Now I'd love to try mlc-llm as well. I followed the instructions, but the pre-built Metal lib for macOS is built for ARM64/Silicon.
Where can I found the source for this so I can try compiling it myself?