Closed jparismorgan closed 1 year ago
https://github.com/mlc-ai/web-llm/issues/80 this might help.
@yzh119 / @jan-www / @yongwww so do I need to build TVM from source? Is this because pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
does not support building for WebGPU? If so, perhaps we could update the getting started docs to include how to build for WebGPU?
Anyways, this is what I did:
~/repo$ git clone https://github.com/mlc-ai/relax.git
~/repo$ cd relax
~/repo/relax$ ./tests/scripts/task_web_wasm.sh
npm run lint
step, comment that out and try again.emcc
is not found:
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
/bin/sh: emcc: command not found
make: *** [dist/wasm/wasm_runtime.bc] Error 127
~/repo/relax$ brew install emscripten
~/repo/relax$ emcc ✹mlc
shared:INFO: (Emscripten: Running sanity checks)
emcc: error: no input files
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
cache:INFO: generating system headers: sysroot_install.stamp... (this will be cached in "/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot_install.stamp" for subsequent builds)
cache:INFO: - ok
In file included from emcc/wasm_runtime.cc:32:
/Users/parismorgan/repo/relax/include/tvm/runtime/c_runtime_api.h:79:10: fatal error: 'dlpack/dlpack.h' file not found
79 | #include <dlpack/dlpack.h>
| ^~~~~~~~~~~~~~~~~
1 error generated.
emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/clang++ -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN --sysroot=/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc' failed (returned 1)
make: *** [dist/wasm/wasm_runtime.bc] Error 1
~/repo/relax$ git submodule update --init
~/repo/relax$ ./tests/scripts/task_web_wasm.sh
++ pwd
+ export PYTHONPATH=/Users/parismorgan/repo/relax/python
+ PYTHONPATH=/Users/parismorgan/repo/relax/python
+ rm -rf .emscripten_cache
+ cd web
+ make clean
+ npm install
up to date, audited 760 packages in 972ms
61 packages are looking for funding
run npm fund
for details
found 0 vulnerabilities
tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d
emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc
emcc/webgpu_runtime.cc:206:71: error: non-virtual member function marked 'final' hides virtual member function
206 | void SaveToFile(const String& file_name, const std::string& format) final {
| ^
/Users/parismorgan/repo/relax/include/tvm/runtime/module.h:174:16: note: hidden overloaded virtual function 'tvm::runtime::ModuleNode::SaveToFile' declared here: type mismatch at 2nd parameter ('const String &' vs 'const std::string &' (aka 'const basic_string
So, my questions:
1. Do you see anything I'm missing?
2. Have others run into this issue? Or they don't run into this because they are on different hardware / software versions?
Thank you!
Thank you for reporting, this is some gap in our installation docs that we will fix. Will report back here
Thank you! I've gotten it working! Here are my steps, sharing because there are a few rough parts that probably have proper fixes we could do instead:
relax
I realized it is a submodule of mlc-llm
. Downloaded it with ~/repo/mlc-llm git submodule update --init --recursive
3rdparty/tvm/tests/scripts/task_web_wasm.sh
I commented out npm run lint
3rdparty/tvm/web/emcc/webgpu_runtime.cc
I changed from void SaveToFile(const String& file_name, const std::string& format) final {
to void SaveToFile(const String& file_name, const std::string& format) {
.~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh
. I got an error:
(mlc-llm) ~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh ✹206661b4a
++ pwd
+ export PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
+ PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
+ rm -rf .emscripten_cache
+ cd web
+ make clean
+ npm install
up to date, audited 760 packages in 1s
61 packages are looking for funding
run npm fund
for details
found 0 vulnerabilities
tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc
emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -o dist/wasm/tvmjs_runtime.js dist/wasm/wasm_runtime.bc dist/wasm/tvmjs_support.bc dist/wasm/webgpu_runtime.bc --no-entry -s WASM_BIGINT=1 -s ALLOW_MEMORY_GROWTH=1 -s STANDALONE_WASM=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 --pre-js emcc/preload.js
warning: undefined symbol: TVMWasmPackedCFunc (referenced by top-level compiled C/C++ code)
warning: undefined symbol: TVMWasmPackedCFuncFinalizer (referenced by top-level compiled C/C++ code)
warning: undefined symbol: _ZN3tvm7runtime9threading10NumThreadsEv (referenced by top-level compiled C/C++ code)
warning: undefined symbol: _ZN3tvm7runtime9threading15ResetThreadPoolEv (referenced by top-level compiled C/C++ code)
emcc: warning: warnings in JS library compilation [-Wjs-compiler]
python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js dist/wasm/tvmjs_runtime.wasi.js cjs
python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js src/tvmjs_runtime_wasi.js es
Traceback (most recent call last):
File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/tests/python/prepare_test_libs.py", line 19, in
There are two interesting notes about this:
1. I do have tvm working I believe, but maybe we need something else to be able to find `'libtvm.dylib'` and `'libtvm_runtime.dylib'`? I double checked and this still works:
(mlc-llm) ~/repo/mlc-llm/3rdparty/tvm python -c "import tvm; print(tvm.metal().exist)" 1 ↵ ✹206661b4a [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 True
2. Even though we got an error, it still looks like we produced the artifacts we need. The original error I ran into was that when running `python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` led to `Cannot find libraries: wasm_runtime.bc`. But I can see that after running `task_web_wasm.sh` we have generated `wasm_runtime.bc`:
~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm$ ls -lah ✹206661b4a total 14824 drwxr-xr-x@ 11 parismorgan staff 352B Jun 21 15:10 . drwxr-xr-x@ 6 parismorgan staff 192B Jun 21 15:09 .. -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.js -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.wasi.js -rwxr-xr-x@ 1 parismorgan staff 1.1M Jun 21 15:10 tvmjs_runtime.wasm -rw-r--r--@ 1 parismorgan staff 155K Jun 21 15:09 tvmjs_support.bc -rw-r--r--@ 1 parismorgan staff 3.3K Jun 21 15:09 tvmjs_support.d -rw-r--r--@ 1 parismorgan staff 3.5M Jun 21 15:09 wasm_runtime.bc -rw-r--r--@ 1 parismorgan staff 9.8K Jun 21 15:09 wasm_runtime.d -rw-r--r--@ 1 parismorgan staff 183K Jun 21 15:09 webgpu_runtime.bc -rw-r--r--@ 1 parismorgan staff 4.1K Jun 21 15:09 webgpu_runtime.d
The problem though is that when I run `~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` it doesn't know to look for `wasm_runtime.bc` in `~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm`.
To work around that I manually copied `wasm_runtime.bc`, `tvmjs_support.bc`, and `tvmjs_support.bc` into `/usr/local/bin`. With that done, I can then build successfully:
(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0 1 ↵ ✹ ✭main Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download. Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256 Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace
And I can verify the artifacts look correct:
(mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0 ✹ ✭main RedPajama-INCITE-Chat-3B-v1-q4f16_0-metal_x86_64.dylib mod_cache_before_build_metal_x86_64.pkl params RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm mod_cache_before_build_webgpu.pkl (mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params ✹ ✭main mlc-chat-config.json params_shard_12.bin params_shard_18.bin params_shard_23.bin params_shard_29.bin params_shard_34.bin params_shard_4.bin params_shard_45.bin params_shard_50.bin tokenizer_config.json ndarray-cache.json params_shard_13.bin params_shard_19.bin params_shard_24.bin params_shard_3.bin params_shard_35.bin params_shard_40.bin params_shard_46.bin params_shard_6.bin params_shard_0.bin params_shard_14.bin params_shard_2.bin params_shard_25.bin params_shard_30.bin params_shard_36.bin params_shard_41.bin params_shard_47.bin params_shard_7.bin params_shard_1.bin params_shard_15.bin params_shard_20.bin params_shard_26.bin params_shard_31.bin params_shard_37.bin params_shard_42.bin params_shard_48.bin params_shard_8.bin params_shard_10.bin params_shard_16.bin params_shard_21.bin params_shard_27.bin params_shard_32.bin params_shard_38.bin params_shard_43.bin params_shard_49.bin params_shard_9.bin params_shard_11.bin params_shard_17.bin params_shard_22.bin params_shard_28.bin params_shard_33.bin params_shard_39.bin params_shard_44.bin params_shard_5.bin tokenizer.json
And I can load the model up in `web-llm/examples/simple-chat` and have it work 🥳
Please checkout
the latest docs will be up soon as well in https://mlc.ai/mlc-llm/docs/compilation/compile_models.html including a web build dependency instruction
@jparismorgan do you mind checkout https://mlc.ai/mlc-llm/docs/install/emcc.html#install-web-build to see if it works for you?
Awesome it worked, thank you!
(mlc-llm) ~/repo/mlc-llm ./scripts/prep_emcc_deps.sh ✹ ✭main
+ emcc --version
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.41-git
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ npm --version
9.5.1
+ TVM_HOME_SET=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
+ git submodule update --init --recursive
+ [[ -z /Users/parismorgan/repo/mlc-llm/3rdparty/tvm ]]
+ cd /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web
+ make
make: Nothing to be done for `all'.
+ cd -
/Users/parismorgan/repo/mlc-llm
(mlc-llm) ~/repo/mlc-llm echo ${TVM_HOME} ✹ ✭main
/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
(mlc-llm) ~/repo/mlc-llm ls -l ${TVM_HOME}/web/dist/wasm/*.bc ✹ ✭main
-rw-r--r--@ 1 parismorgan staff 158252 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/tvmjs_support.bc
-rw-r--r--@ 1 parismorgan staff 3713032 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/wasm_runtime.bc
-rw-r--r--@ 1 parismorgan staff 187676 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/webgpu_runtime.bc
(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f32_0 ✹ ✭main
Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download.
Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace
- Dispatch to pre-scheduled op: matmul4
- Dispatch to pre-scheduled op: fused_NT_matmul_add
- Dispatch to pre-scheduled op: softmax1
- Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu
- Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
- Dispatch to pre-scheduled op: fused_NT_matmul3_add_cast_add1
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: softmax
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: layer_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_divide1_maximum1_minimum1
- Dispatch to pre-scheduled op: fused_NT_matmul1_divide_maximum_minimum
[16:19:44] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Finish exporting to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/RedPajama-INCITE-Chat-3B-v1-q4f32_0-webgpu.wasm
Finish exporting chat config to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/params/mlc-chat-config.json
Thank you @jparismorgan !
🐛 Bug
I am trying to compile the example model
python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0
from the docs but getRuntimeError: Cannot find libraries: wasm_runtime.bc
.To Reproduce
Steps to reproduce the behavior:
I am following the instructions to compile a model for WebGPU: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html
git clone https://github.com/mlc-ai/mlc-llm.git
cd ~/repo/mlc-llm
pip install torch
pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0
:And when I check the output I can see that
dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm
doesn't exist:Expected behavior
There is no error and the
dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm
file would be created.Environment
conda
, source): Cloned https://github.com/mlc-ai/mlc-llmpip
, source):pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context
Please let me know if there is something I missed in the setup, thank you for any help!