[Bug] Compiling for WebGPU on OSX leads to `RuntimeError: Cannot find libraries: wasm_runtime.bc`

jparismorgan commented 1 year ago

🐛 Bug

I am trying to compile the example model python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0 from the docs but get RuntimeError: Cannot find libraries: wasm_runtime.bc.

To Reproduce

Steps to reproduce the behavior:

I am following the instructions to compile a model for WebGPU: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html

git clone https://github.com/mlc-ai/mlc-llm.git
cd ~/repo/mlc-llm
Create new virtual environment and activate it
pip install torch
pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels

Run python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0:

(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0                                                                                                                                                     
Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download.
Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
[23:07:34] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X
[23:07:34] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630
Host CPU dection:
Target triple: x86_64-apple-darwin22.3.0
Process triple: x86_64-apple-darwin22.3.0
Host CPU: skylake
Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 1.4645195007324219 GB
Start storing to cache dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params
[0710/0710] saving param_709
All finished, 51 total shards committed, record saved to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params/ndarray-cache.json
Save a cached module to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/mod_cache_before_build_webgpu.pkl.
- Dispatch to pre-scheduled op: fused_softmax2_cast10
- Dispatch to pre-scheduled op: matmul3
- Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1_cast
- Dispatch to pre-scheduled op: fused_NT_matmul1_divide1_maximum_minimum_cast2
- Dispatch to pre-scheduled op: fused_NT_matmul_add
- Dispatch to pre-scheduled op: layer_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_divide2_maximum1_minimum1_cast9
- Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu_cast4
- Dispatch to pre-scheduled op: fused_layer_norm_cast1
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1
- Dispatch to pre-scheduled op: fused_softmax1_cast3
[23:08:41] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Traceback (most recent call last):
File "/Users/parismorgan/repo/mlc-llm/build.py", line 432, in <module>
main()
File "/Users/parismorgan/repo/mlc-llm/build.py", line 424, in main
build(mod, ARGS)
File "/Users/parismorgan/repo/mlc-llm/build.py", line 384, in build
ex.export_library(lib_path, **args.export_kwargs)
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/relax/vm_build.py", line 147, in export_library
return self.mod.export_library(
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/runtime/module.py", line 598, in export_library
return fcompile(file_name, files, **kwargs)
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/contrib/emcc.py", line 60, in create_tvmjs_wasm
libs += [find_lib_path("wasm_runtime.bc")[0]]
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/_ffi/libinfo.py", line 152, in find_lib_path
raise RuntimeError(message)
RuntimeError: Cannot find libraries: wasm_runtime.bc
List of candidates:
/Users/parismorgan/virtualenvs/mlc-llm/bin/wasm_runtime.bc
/Users/parismorgan/.nvm/versions/node/v18.16.0/bin/wasm_runtime.bc
/usr/local/bin/wasm_runtime.bc
/System/Volumes/Preboot/Cryptexes/App/usr/bin/wasm_runtime.bc
/usr/bin/wasm_runtime.bc
/bin/wasm_runtime.bc
/usr/sbin/wasm_runtime.bc
/sbin/wasm_runtime.bc
/Users/parismorgan/.nvm/versions/node/v18.16.0/bin/wasm_runtime.bc
/usr/local/Cellar/ccache/4.8/libexec/wasm_runtime.bc
/usr/local/Cellar/openjdk/20.0.1/bin/wasm_runtime.bc
/usr/local/Cellar/opencv@3/3.4.16_7/bin/wasm_runtime.bc
/usr/local/Cellar/ccache/4.8/libexec/wasm_runtime.bc
/usr/local/Cellar/openjdk/20.0.1/bin/wasm_runtime.bc
/usr/local/Cellar/opencv@3/3.4.16_7/bin/wasm_runtime.bc
/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/wasm_runtime.bc
/Users/parismorgan/virtualenvs/mlc-llm/lib/wasm_runtime.bc

And when I check the output I can see that dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm doesn't exist:

(mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0                                                                                                                                                                                                                        
mod_cache_before_build_webgpu.pkl params
(mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params                                                                                                                                                                                                                 
ndarray-cache.json    params_shard_12.bin   params_shard_17.bin   params_shard_21.bin   params_shard_26.bin   params_shard_30.bin   params_shard_35.bin   params_shard_4.bin    params_shard_44.bin   params_shard_49.bin   params_shard_8.bin
params_shard_0.bin    params_shard_13.bin   params_shard_18.bin   params_shard_22.bin   params_shard_27.bin   params_shard_31.bin   params_shard_36.bin   params_shard_40.bin   params_shard_45.bin   params_shard_5.bin    params_shard_9.bin
params_shard_1.bin    params_shard_14.bin   params_shard_19.bin   params_shard_23.bin   params_shard_28.bin   params_shard_32.bin   params_shard_37.bin   params_shard_41.bin   params_shard_46.bin   params_shard_50.bin   tokenizer.json
params_shard_10.bin   params_shard_15.bin   params_shard_2.bin    params_shard_24.bin   params_shard_29.bin   params_shard_33.bin   params_shard_38.bin   params_shard_42.bin   params_shard_47.bin   params_shard_6.bin    tokenizer_config.json
params_shard_11.bin   params_shard_16.bin   params_shard_20.bin   params_shard_25.bin   params_shard_3.bin    params_shard_34.bin   params_shard_39.bin   params_shard_43.bin   params_shard_48.bin   params_shard_7.bin

Expected behavior

There is no error and the dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm file would be created.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): WebGPU
Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS 13.2.1 (22D68)
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): 2018 x86 MacBook Pro
How you installed MLC-LLM (conda, source): Cloned https://github.com/mlc-ai/mlc-llm
How you installed TVM-Unity (pip, source): pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
Python version (e.g. 3.10): 3.9.6
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

(mlc-llm) ~/repo/mlc-llm python -c "import tvm; print(tvm.__file__)"                                                                                                                                                                                                                        
/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/__init__.py
(mlc-llm) ~/repo/mlc-llm python -c "import tvm; print(tvm._ffi.base._LIB)"                                                                                                                                                                                                                  
<CDLL '/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/libtvm.dylib', handle 7ff91e939300 at 0x10dfeab50>
(mlc-llm) ~/repo/mlc-llm python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"                                                                                                                                                                    
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 206661b4aecb362dd5bccfba4de9013f8175cf20
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-06-17 14:36:49 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 16.0.6
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

Any other relevant information:

Additional context

Please let me know if there is something I missed in the setup, thank you for any help!

yzh119 commented 1 year ago

https://github.com/mlc-ai/web-llm/issues/80 this might help.

jparismorgan commented 1 year ago

@yzh119 / @jan-www / @yongwww so do I need to build TVM from source? Is this because pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels does not support building for WebGPU? If so, perhaps we could update the getting started docs to include how to build for WebGPU?

Anyways, this is what I did:

~/repo$ git clone https://github.com/mlc-ai/relax.git
~/repo$ cd relax
~/repo/relax$ ./tests/scripts/task_web_wasm.sh
Have that fail in the npm run lint step, comment that out and try again.

Fail because emcc is not found:

emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
/bin/sh: emcc: command not found
make: *** [dist/wasm/wasm_runtime.bc] Error 127

Install emscripten with ~/repo/relax$ brew install emscripten

Verify it's installed:

~/repo/relax$ emcc                                                                                         ✹mlc
shared:INFO: (Emscripten: Running sanity checks)
emcc: error: no input files

Try to build again but get an error:

emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
cache:INFO: generating system headers: sysroot_install.stamp... (this will be cached in "/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot_install.stamp" for subsequent builds)
cache:INFO:  - ok
In file included from emcc/wasm_runtime.cc:32:
/Users/parismorgan/repo/relax/include/tvm/runtime/c_runtime_api.h:79:10: fatal error: 'dlpack/dlpack.h' file not found
79 | #include <dlpack/dlpack.h>
  |          ^~~~~~~~~~~~~~~~~
1 error generated.
emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/clang++ -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN --sysroot=/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc' failed (returned 1)
make: *** [dist/wasm/wasm_runtime.bc] Error 1

Run ~/repo/relax$ git submodule update --init

I then get this error:


~/repo/relax$ ./tests/scripts/task_web_wasm.sh
++ pwd
+ export PYTHONPATH=/Users/parismorgan/repo/relax/python
+ PYTHONPATH=/Users/parismorgan/repo/relax/python
+ rm -rf .emscripten_cache
+ cd web
+ make clean
+ npm install

up to date, audited 760 packages in 972ms

61 packages are looking for funding run npm fund for details

found 0 vulnerabilities

npm run prepwasm

tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py

emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc emcc/webgpu_runtime.cc:206:71: error: non-virtual member function marked 'final' hides virtual member function 206 | void SaveToFile(const String& file_name, const std::string& format) final { | ^ /Users/parismorgan/repo/relax/include/tvm/runtime/module.h:174:16: note: hidden overloaded virtual function 'tvm::runtime::ModuleNode::SaveToFile' declared here: type mismatch at 2nd parameter ('const String &' vs 'const std::string &' (aka 'const basic_string &')) 174 | virtual void SaveToFile(const String& file_name, const String& format); | ^ 1 error generated. emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/clang++ -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN --sysroot=/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c emcc/webgpu_runtime.cc -o dist/wasm/webgpu_runtime.bc' failed (returned 1) make: *** [dist/wasm/webgpu_runtime.bc] Error 1



So, my questions:
1. Do you see anything I'm missing?
2. Have others run into this issue? Or they don't run into this because they are on different hardware / software versions?

Thank you!

tqchen commented 1 year ago

Thank you for reporting, this is some gap in our installation docs that we will fix. Will report back here

jparismorgan commented 1 year ago

Thank you! I've gotten it working! Here are my steps, sharing because there are a few rough parts that probably have proper fixes we could do instead:

Instead of cloning relax I realized it is a submodule of mlc-llm. Downloaded it with ~/repo/mlc-llm git submodule update --init --recursive
In 3rdparty/tvm/tests/scripts/task_web_wasm.sh I commented out npm run lint
In 3rdparty/tvm/web/emcc/webgpu_runtime.cc I changed from void SaveToFile(const String& file_name, const std::string& format) final { to void SaveToFile(const String& file_name, const std::string& format) {.

I then built with ~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh. I got an error:


(mlc-llm) ~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh                                                                                                                                                   ✹206661b4a 
++ pwd
+ export PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
+ PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
+ rm -rf .emscripten_cache
+ cd web
+ make clean
+ npm install

up to date, audited 760 packages in 1s

61 packages are looking for funding run npm fund for details

found 0 vulnerabilities

npm run prepwasm

tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py

emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -o dist/wasm/tvmjs_runtime.js dist/wasm/wasm_runtime.bc dist/wasm/tvmjs_support.bc dist/wasm/webgpu_runtime.bc --no-entry -s WASM_BIGINT=1 -s ALLOW_MEMORY_GROWTH=1 -s STANDALONE_WASM=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 --pre-js emcc/preload.js warning: undefined symbol: TVMWasmPackedCFunc (referenced by top-level compiled C/C++ code) warning: undefined symbol: TVMWasmPackedCFuncFinalizer (referenced by top-level compiled C/C++ code) warning: undefined symbol: _ZN3tvm7runtime9threading10NumThreadsEv (referenced by top-level compiled C/C++ code) warning: undefined symbol: _ZN3tvm7runtime9threading15ResetThreadPoolEv (referenced by top-level compiled C/C++ code) emcc: warning: warnings in JS library compilation [-Wjs-compiler] python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js dist/wasm/tvmjs_runtime.wasi.js cjs python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js src/tvmjs_runtime_wasi.js es Traceback (most recent call last): File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/tests/python/prepare_test_libs.py", line 19, in import tvm File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/init.py", line 26, in from ._ffi.base import TVMError, version, _RUNTIME_ONLY File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/init.py", line 28, in from .base import register_error File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/base.py", line 71, in _LIB, _LIB_NAME = _load_lib() File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/base.py", line 51, in _load_lib lib_path = libinfo.find_lib_path() File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/libinfo.py", line 152, in find_lib_path raise RuntimeError(message) RuntimeError: Cannot find libraries: ['libtvm.dylib', 'libtvm_runtime.dylib'] List of candidates: /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/node_modules/.bin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/lib/node_modules/npm/node_modules/@npmcli/run-script/lib/node-gyp-bin/libtvm.dylib /Users/parismorgan/virtualenvs/mlc-llm/bin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm.dylib /usr/local/bin/libtvm.dylib /System/Volumes/Preboot/Cryptexes/App/usr/bin/libtvm.dylib /usr/bin/libtvm.dylib /bin/libtvm.dylib /usr/sbin/libtvm.dylib /sbin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/node_modules/.bin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/lib/node_modules/npm/node_modules/@npmcli/run-script/lib/node-gyp-bin/libtvm_runtime.dylib /Users/parismorgan/virtualenvs/mlc-llm/bin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm_runtime.dylib /usr/local/bin/libtvm_runtime.dylib /System/Volumes/Preboot/Cryptexes/App/usr/bin/libtvm_runtime.dylib /usr/bin/libtvm_runtime.dylib /bin/libtvm_runtime.dylib /usr/sbin/libtvm_runtime.dylib /sbin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm_runtime.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm_runtime.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm_runtime.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm_runtime.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm_runtime.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm_runtime.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/libtvm_runtime.dylib

There are two interesting notes about this:
1. I do have tvm working I believe, but maybe we need something else to be able to find `'libtvm.dylib'` and `'libtvm_runtime.dylib'`? I double checked and this still works:

(mlc-llm) ~/repo/mlc-llm/3rdparty/tvm python -c "import tvm; print(tvm.metal().exist)" 1 ↵ ✹206661b4a [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 True

2. Even though we got an error, it still looks like we produced the artifacts we need. The original error I ran into was that when running `python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` led to `Cannot find libraries: wasm_runtime.bc`. But I can see that after running `task_web_wasm.sh` we have generated `wasm_runtime.bc`:

~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm$ ls -lah ✹206661b4a total 14824 drwxr-xr-x@ 11 parismorgan staff 352B Jun 21 15:10 . drwxr-xr-x@ 6 parismorgan staff 192B Jun 21 15:09 .. -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.js -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.wasi.js -rwxr-xr-x@ 1 parismorgan staff 1.1M Jun 21 15:10 tvmjs_runtime.wasm -rw-r--r--@ 1 parismorgan staff 155K Jun 21 15:09 tvmjs_support.bc -rw-r--r--@ 1 parismorgan staff 3.3K Jun 21 15:09 tvmjs_support.d -rw-r--r--@ 1 parismorgan staff 3.5M Jun 21 15:09 wasm_runtime.bc -rw-r--r--@ 1 parismorgan staff 9.8K Jun 21 15:09 wasm_runtime.d -rw-r--r--@ 1 parismorgan staff 183K Jun 21 15:09 webgpu_runtime.bc -rw-r--r--@ 1 parismorgan staff 4.1K Jun 21 15:09 webgpu_runtime.d


The problem though is that when I run `~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` it doesn't know to look for `wasm_runtime.bc` in `~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm`. 

To work around that I manually copied `wasm_runtime.bc`, `tvmjs_support.bc`, and  `tvmjs_support.bc` into `/usr/local/bin`. With that done, I can then build successfully:

(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0 1 ↵ ✹ ✭main Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download. Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256 Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace

Dispatch to pre-scheduled op: fused_NT_matmul_add
Dispatch to pre-scheduled op: matmul3
Dispatch to pre-scheduled op: fused_layer_norm_cast1
Dispatch to pre-scheduled op: fused_softmax2_cast10
Dispatch to pre-scheduled op: matmul8
Dispatch to pre-scheduled op: fused_NT_matmul1_divide1_maximum_minimum_cast2
Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1_cast
Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
Dispatch to pre-scheduled op: fused_softmax1_cast3
Dispatch to pre-scheduled op: layer_norm
Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu_cast4
Dispatch to pre-scheduled op: fused_NT_matmul4_divide2_maximum1_minimum1_cast9
Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1 [15:20:02] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32 Finish exporting to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm Finish exporting chat config to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params/mlc-chat-config.json
```
And I can verify the artifacts look correct:
```
(mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0 ✹ ✭main RedPajama-INCITE-Chat-3B-v1-q4f16_0-metal_x86_64.dylib mod_cache_before_build_metal_x86_64.pkl params RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm mod_cache_before_build_webgpu.pkl (mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params ✹ ✭main mlc-chat-config.json params_shard_12.bin params_shard_18.bin params_shard_23.bin params_shard_29.bin params_shard_34.bin params_shard_4.bin params_shard_45.bin params_shard_50.bin tokenizer_config.json ndarray-cache.json params_shard_13.bin params_shard_19.bin params_shard_24.bin params_shard_3.bin params_shard_35.bin params_shard_40.bin params_shard_46.bin params_shard_6.bin params_shard_0.bin params_shard_14.bin params_shard_2.bin params_shard_25.bin params_shard_30.bin params_shard_36.bin params_shard_41.bin params_shard_47.bin params_shard_7.bin params_shard_1.bin params_shard_15.bin params_shard_20.bin params_shard_26.bin params_shard_31.bin params_shard_37.bin params_shard_42.bin params_shard_48.bin params_shard_8.bin params_shard_10.bin params_shard_16.bin params_shard_21.bin params_shard_27.bin params_shard_32.bin params_shard_38.bin params_shard_43.bin params_shard_49.bin params_shard_9.bin params_shard_11.bin params_shard_17.bin params_shard_22.bin params_shard_28.bin params_shard_33.bin params_shard_39.bin params_shard_44.bin params_shard_5.bin tokenizer.json
```
And I can load the model up in `web-llm/examples/simple-chat` and have it work 🥳
```

tqchen commented 1 year ago

Please checkout

the latest docs will be up soon as well in https://mlc.ai/mlc-llm/docs/compilation/compile_models.html including a web build dependency instruction

tqchen commented 1 year ago

@jparismorgan do you mind checkout https://mlc.ai/mlc-llm/docs/install/emcc.html#install-web-build to see if it works for you?

jparismorgan commented 1 year ago

Awesome it worked, thank you!

(mlc-llm) ~/repo/mlc-llm ./scripts/prep_emcc_deps.sh                                                                                                                                    ✹ ✭main 
+ emcc --version
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.41-git
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ npm --version
9.5.1
+ TVM_HOME_SET=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
+ git submodule update --init --recursive
+ [[ -z /Users/parismorgan/repo/mlc-llm/3rdparty/tvm ]]
+ cd /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web
+ make
make: Nothing to be done for `all'.
+ cd -
/Users/parismorgan/repo/mlc-llm
(mlc-llm) ~/repo/mlc-llm echo ${TVM_HOME}                                                                                                                                               ✹ ✭main 
/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
(mlc-llm) ~/repo/mlc-llm  ls -l ${TVM_HOME}/web/dist/wasm/*.bc                                                                                                                          ✹ ✭main 
-rw-r--r--@ 1 parismorgan  staff   158252 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/tvmjs_support.bc
-rw-r--r--@ 1 parismorgan  staff  3713032 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/wasm_runtime.bc
-rw-r--r--@ 1 parismorgan  staff   187676 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/webgpu_runtime.bc
(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f32_0                                                 ✹ ✭main 
Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download.
Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace
- Dispatch to pre-scheduled op: matmul4
- Dispatch to pre-scheduled op: fused_NT_matmul_add
- Dispatch to pre-scheduled op: softmax1
- Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu
- Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
- Dispatch to pre-scheduled op: fused_NT_matmul3_add_cast_add1
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: softmax
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: layer_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_divide1_maximum1_minimum1
- Dispatch to pre-scheduled op: fused_NT_matmul1_divide_maximum_minimum
[16:19:44] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Finish exporting to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/RedPajama-INCITE-Chat-3B-v1-q4f32_0-webgpu.wasm
Finish exporting chat config to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/params/mlc-chat-config.json

tqchen commented 1 year ago

Thank you @jparismorgan !

mlc-ai / mlc-llm