mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.15k stars 1.57k forks source link

[Bug] Compiling for WebGPU on OSX leads to `RuntimeError: Cannot find libraries: wasm_runtime.bc` #452

Closed jparismorgan closed 1 year ago

jparismorgan commented 1 year ago

🐛 Bug

I am trying to compile the example model python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0 from the docs but get RuntimeError: Cannot find libraries: wasm_runtime.bc.

To Reproduce

Steps to reproduce the behavior:

I am following the instructions to compile a model for WebGPU: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html

  1. git clone https://github.com/mlc-ai/mlc-llm.git
  2. cd ~/repo/mlc-llm
  3. Create new virtual environment and activate it
  4. pip install torch
  5. pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
  6. Run python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0:
    (mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0                                                                                                                                                     
    Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download.
    Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1"
    Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
    Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
    [23:07:34] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X
    [23:07:34] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630
    Host CPU dection:
    Target triple: x86_64-apple-darwin22.3.0
    Process triple: x86_64-apple-darwin22.3.0
    Host CPU: skylake
    Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
    Start computing and quantizing weights... This may take a while.
    Finish computing and quantizing weights.
    Total param size: 1.4645195007324219 GB
    Start storing to cache dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params
    [0710/0710] saving param_709
    All finished, 51 total shards committed, record saved to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params/ndarray-cache.json
    Save a cached module to dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/mod_cache_before_build_webgpu.pkl.
    - Dispatch to pre-scheduled op: fused_softmax2_cast10
    - Dispatch to pre-scheduled op: matmul3
    - Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
    - Dispatch to pre-scheduled op: matmul8
    - Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1_cast
    - Dispatch to pre-scheduled op: fused_NT_matmul1_divide1_maximum_minimum_cast2
    - Dispatch to pre-scheduled op: fused_NT_matmul_add
    - Dispatch to pre-scheduled op: layer_norm
    - Dispatch to pre-scheduled op: fused_NT_matmul4_divide2_maximum1_minimum1_cast9
    - Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu_cast4
    - Dispatch to pre-scheduled op: fused_layer_norm_cast1
    - Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
    - Dispatch to pre-scheduled op: fused_NT_matmul3_add3_cast1_cast5_add1
    - Dispatch to pre-scheduled op: fused_softmax1_cast3
    [23:08:41] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
    Traceback (most recent call last):
    File "/Users/parismorgan/repo/mlc-llm/build.py", line 432, in <module>
    main()
    File "/Users/parismorgan/repo/mlc-llm/build.py", line 424, in main
    build(mod, ARGS)
    File "/Users/parismorgan/repo/mlc-llm/build.py", line 384, in build
    ex.export_library(lib_path, **args.export_kwargs)
    File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/relax/vm_build.py", line 147, in export_library
    return self.mod.export_library(
    File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/runtime/module.py", line 598, in export_library
    return fcompile(file_name, files, **kwargs)
    File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/contrib/emcc.py", line 60, in create_tvmjs_wasm
    libs += [find_lib_path("wasm_runtime.bc")[0]]
    File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/_ffi/libinfo.py", line 152, in find_lib_path
    raise RuntimeError(message)
    RuntimeError: Cannot find libraries: wasm_runtime.bc
    List of candidates:
    /Users/parismorgan/virtualenvs/mlc-llm/bin/wasm_runtime.bc
    /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/wasm_runtime.bc
    /usr/local/bin/wasm_runtime.bc
    /System/Volumes/Preboot/Cryptexes/App/usr/bin/wasm_runtime.bc
    /usr/bin/wasm_runtime.bc
    /bin/wasm_runtime.bc
    /usr/sbin/wasm_runtime.bc
    /sbin/wasm_runtime.bc
    /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/wasm_runtime.bc
    /usr/local/Cellar/ccache/4.8/libexec/wasm_runtime.bc
    /usr/local/Cellar/openjdk/20.0.1/bin/wasm_runtime.bc
    /usr/local/Cellar/opencv@3/3.4.16_7/bin/wasm_runtime.bc
    /usr/local/Cellar/ccache/4.8/libexec/wasm_runtime.bc
    /usr/local/Cellar/openjdk/20.0.1/bin/wasm_runtime.bc
    /usr/local/Cellar/opencv@3/3.4.16_7/bin/wasm_runtime.bc
    /Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/wasm_runtime.bc
    /Users/parismorgan/virtualenvs/mlc-llm/lib/wasm_runtime.bc

    And when I check the output I can see that dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm doesn't exist:

    (mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0                                                                                                                                                                                                                        
    mod_cache_before_build_webgpu.pkl params
    (mlc-llm) ~/repo/mlc-llm ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/params                                                                                                                                                                                                                 
    ndarray-cache.json    params_shard_12.bin   params_shard_17.bin   params_shard_21.bin   params_shard_26.bin   params_shard_30.bin   params_shard_35.bin   params_shard_4.bin    params_shard_44.bin   params_shard_49.bin   params_shard_8.bin
    params_shard_0.bin    params_shard_13.bin   params_shard_18.bin   params_shard_22.bin   params_shard_27.bin   params_shard_31.bin   params_shard_36.bin   params_shard_40.bin   params_shard_45.bin   params_shard_5.bin    params_shard_9.bin
    params_shard_1.bin    params_shard_14.bin   params_shard_19.bin   params_shard_23.bin   params_shard_28.bin   params_shard_32.bin   params_shard_37.bin   params_shard_41.bin   params_shard_46.bin   params_shard_50.bin   tokenizer.json
    params_shard_10.bin   params_shard_15.bin   params_shard_2.bin    params_shard_24.bin   params_shard_29.bin   params_shard_33.bin   params_shard_38.bin   params_shard_42.bin   params_shard_47.bin   params_shard_6.bin    tokenizer_config.json
    params_shard_11.bin   params_shard_16.bin   params_shard_20.bin   params_shard_25.bin   params_shard_3.bin    params_shard_34.bin   params_shard_39.bin   params_shard_43.bin   params_shard_48.bin   params_shard_7.bin

Expected behavior

There is no error and the dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/RedPajama-INCITE-Chat-3B-v1-q4f16_0-webgpu.wasm file would be created.

Environment

Additional context

Please let me know if there is something I missed in the setup, thank you for any help!

yzh119 commented 1 year ago

https://github.com/mlc-ai/web-llm/issues/80 this might help.

jparismorgan commented 1 year ago

@yzh119 / @jan-www / @yongwww so do I need to build TVM from source? Is this because pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels does not support building for WebGPU? If so, perhaps we could update the getting started docs to include how to build for WebGPU?

Anyways, this is what I did:

  1. ~/repo$ git clone https://github.com/mlc-ai/relax.git
  2. ~/repo$ cd relax
  3. ~/repo/relax$ ./tests/scripts/task_web_wasm.sh
  4. Have that fail in the npm run lint step, comment that out and try again.
  5. Fail because emcc is not found:
    emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
    /bin/sh: emcc: command not found
    make: *** [dist/wasm/wasm_runtime.bc] Error 127
  6. Install emscripten with ~/repo/relax$ brew install emscripten
  7. Verify it's installed:
    ~/repo/relax$ emcc                                                                                         ✹mlc
    shared:INFO: (Emscripten: Running sanity checks)
    emcc: error: no input files
  8. Try to build again but get an error:
    emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d
    cache:INFO: generating system headers: sysroot_install.stamp... (this will be cached in "/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot_install.stamp" for subsequent builds)
    cache:INFO:  - ok
    In file included from emcc/wasm_runtime.cc:32:
    /Users/parismorgan/repo/relax/include/tvm/runtime/c_runtime_api.h:79:10: fatal error: 'dlpack/dlpack.h' file not found
    79 | #include <dlpack/dlpack.h>
      |          ^~~~~~~~~~~~~~~~~
    1 error generated.
    emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/clang++ -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN --sysroot=/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc' failed (returned 1)
    make: *** [dist/wasm/wasm_runtime.bc] Error 1
  9. Run ~/repo/relax$ git submodule update --init
  10. I then get this error:
    
    ~/repo/relax$ ./tests/scripts/task_web_wasm.sh
    ++ pwd
    + export PYTHONPATH=/Users/parismorgan/repo/relax/python
    + PYTHONPATH=/Users/parismorgan/repo/relax/python
    + rm -rf .emscripten_cache
    + cd web
    + make clean
    + npm install

up to date, audited 760 packages in 972ms

61 packages are looking for funding run npm fund for details

found 0 vulnerabilities

tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py

emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d emcc -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc emcc/webgpu_runtime.cc:206:71: error: non-virtual member function marked 'final' hides virtual member function 206 | void SaveToFile(const String& file_name, const std::string& format) final { | ^ /Users/parismorgan/repo/relax/include/tvm/runtime/module.h:174:16: note: hidden overloaded virtual function 'tvm::runtime::ModuleNode::SaveToFile' declared here: type mismatch at 2nd parameter ('const String &' vs 'const std::string &' (aka 'const basic_string &')) 174 | virtual void SaveToFile(const String& file_name, const String& format); | ^ 1 error generated. emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/clang++ -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN --sysroot=/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -I/Users/parismorgan/repo/relax -I/Users/parismorgan/repo/relax/include -I/Users/parismorgan/repo/relax/3rdparty/dlpack/include -I/Users/parismorgan/repo/relax/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/relax/3rdparty/compiler-rt -I/Users/parismorgan/repo/relax/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c emcc/webgpu_runtime.cc -o dist/wasm/webgpu_runtime.bc' failed (returned 1) make: *** [dist/wasm/webgpu_runtime.bc] Error 1



So, my questions:
1. Do you see anything I'm missing?
2. Have others run into this issue? Or they don't run into this because they are on different hardware / software versions?

Thank you!
tqchen commented 1 year ago

Thank you for reporting, this is some gap in our installation docs that we will fix. Will report back here

jparismorgan commented 1 year ago

Thank you! I've gotten it working! Here are my steps, sharing because there are a few rough parts that probably have proper fixes we could do instead:

  1. Instead of cloning relax I realized it is a submodule of mlc-llm. Downloaded it with ~/repo/mlc-llm git submodule update --init --recursive
  2. In 3rdparty/tvm/tests/scripts/task_web_wasm.sh I commented out npm run lint
  3. In 3rdparty/tvm/web/emcc/webgpu_runtime.cc I changed from void SaveToFile(const String& file_name, const std::string& format) final { to void SaveToFile(const String& file_name, const std::string& format) {.
  4. I then built with ~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh. I got an error:
    
    (mlc-llm) ~/repo/mlc-llm/3rdparty/tvm ./tests/scripts/task_web_wasm.sh                                                                                                                                                   ✹206661b4a 
    ++ pwd
    + export PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
    + PYTHONPATH=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python
    + rm -rf .emscripten_cache
    + cd web
    + make clean
    + npm install

up to date, audited 760 packages in 1s

61 packages are looking for funding run npm fund for details

found 0 vulnerabilities

tvmjs@0.13.0-dev0 prepwasm make && python3 tests/python/prepare_test_libs.py

emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc >dist/wasm/wasm_runtime.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/wasm_runtime.bc emcc/wasm_runtime.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc >dist/wasm/tvmjs_support.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/tvmjs_support.bc emcc/tvmjs_support.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -c -MM -MT dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc >dist/wasm/webgpu_runtime.d emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -emit-llvm -c -o dist/wasm/webgpu_runtime.bc emcc/webgpu_runtime.cc emcc -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dlpack/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/dmlc-core/include -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/compiler-rt -I/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/3rdparty/picojson -O3 -std=c++17 -Wno-ignored-attributes -o dist/wasm/tvmjs_runtime.js dist/wasm/wasm_runtime.bc dist/wasm/tvmjs_support.bc dist/wasm/webgpu_runtime.bc --no-entry -s WASM_BIGINT=1 -s ALLOW_MEMORY_GROWTH=1 -s STANDALONE_WASM=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 --pre-js emcc/preload.js warning: undefined symbol: TVMWasmPackedCFunc (referenced by top-level compiled C/C++ code) warning: undefined symbol: TVMWasmPackedCFuncFinalizer (referenced by top-level compiled C/C++ code) warning: undefined symbol: _ZN3tvm7runtime9threading10NumThreadsEv (referenced by top-level compiled C/C++ code) warning: undefined symbol: _ZN3tvm7runtime9threading15ResetThreadPoolEv (referenced by top-level compiled C/C++ code) emcc: warning: warnings in JS library compilation [-Wjs-compiler] python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js dist/wasm/tvmjs_runtime.wasi.js cjs python3 emcc/decorate_as_wasi.py dist/wasm/tvmjs_runtime.js src/tvmjs_runtime_wasi.js es Traceback (most recent call last): File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/tests/python/prepare_test_libs.py", line 19, in import tvm File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/init.py", line 26, in from ._ffi.base import TVMError, version, _RUNTIME_ONLY File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/init.py", line 28, in from .base import register_error File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/base.py", line 71, in _LIB, _LIB_NAME = _load_lib() File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/base.py", line 51, in _load_lib lib_path = libinfo.find_lib_path() File "/Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/_ffi/libinfo.py", line 152, in find_lib_path raise RuntimeError(message) RuntimeError: Cannot find libraries: ['libtvm.dylib', 'libtvm_runtime.dylib'] List of candidates: /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/node_modules/.bin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/lib/node_modules/npm/node_modules/@npmcli/run-script/lib/node-gyp-bin/libtvm.dylib /Users/parismorgan/virtualenvs/mlc-llm/bin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm.dylib /usr/local/bin/libtvm.dylib /System/Volumes/Preboot/Cryptexes/App/usr/bin/libtvm.dylib /usr/bin/libtvm.dylib /bin/libtvm.dylib /usr/sbin/libtvm.dylib /sbin/libtvm.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/libtvm.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/node_modules/.bin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/lib/node_modules/npm/node_modules/@npmcli/run-script/lib/node-gyp-bin/libtvm_runtime.dylib /Users/parismorgan/virtualenvs/mlc-llm/bin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm_runtime.dylib /usr/local/bin/libtvm_runtime.dylib /System/Volumes/Preboot/Cryptexes/App/usr/bin/libtvm_runtime.dylib /usr/bin/libtvm_runtime.dylib /bin/libtvm_runtime.dylib /usr/sbin/libtvm_runtime.dylib /sbin/libtvm_runtime.dylib /Users/parismorgan/.nvm/versions/node/v18.16.0/bin/libtvm_runtime.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm_runtime.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm_runtime.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm_runtime.dylib /usr/local/Cellar/ccache/4.8/libexec/libtvm_runtime.dylib /usr/local/Cellar/openjdk/20.0.1/bin/libtvm_runtime.dylib /usr/local/Cellar/opencv@3/3.4.16_9/bin/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/python/tvm/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/libtvm_runtime.dylib /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/libtvm_runtime.dylib

There are two interesting notes about this:
1. I do have tvm working I believe, but maybe we need something else to be able to find `'libtvm.dylib'` and `'libtvm_runtime.dylib'`? I double checked and this still works:

(mlc-llm) ~/repo/mlc-llm/3rdparty/tvm python -c "import tvm; print(tvm.metal().exist)" 1 ↵ ✹206661b4a [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X [15:11:35] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 True

2. Even though we got an error, it still looks like we produced the artifacts we need. The original error I ran into was that when running `python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` led to `Cannot find libraries: wasm_runtime.bc`. But I can see that after running `task_web_wasm.sh` we have generated `wasm_runtime.bc`:

~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm$ ls -lah ✹206661b4a total 14824 drwxr-xr-x@ 11 parismorgan staff 352B Jun 21 15:10 . drwxr-xr-x@ 6 parismorgan staff 192B Jun 21 15:09 .. -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.js -rw-r--r--@ 1 parismorgan staff 81K Jun 21 15:10 tvmjs_runtime.wasi.js -rwxr-xr-x@ 1 parismorgan staff 1.1M Jun 21 15:10 tvmjs_runtime.wasm -rw-r--r--@ 1 parismorgan staff 155K Jun 21 15:09 tvmjs_support.bc -rw-r--r--@ 1 parismorgan staff 3.3K Jun 21 15:09 tvmjs_support.d -rw-r--r--@ 1 parismorgan staff 3.5M Jun 21 15:09 wasm_runtime.bc -rw-r--r--@ 1 parismorgan staff 9.8K Jun 21 15:09 wasm_runtime.d -rw-r--r--@ 1 parismorgan staff 183K Jun 21 15:09 webgpu_runtime.bc -rw-r--r--@ 1 parismorgan staff 4.1K Jun 21 15:09 webgpu_runtime.d


The problem though is that when I run `~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0` it doesn't know to look for `wasm_runtime.bc` in `~/repo/mlc-llm/3rdparty/tvm/web/dist/wasm`. 

To work around that I manually copied `wasm_runtime.bc`, `tvmjs_support.bc`, and  `tvmjs_support.bc` into `/usr/local/bin`. With that done, I can then build successfully:

(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f16_0 1 ↵ ✹ ✭main Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download. Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256 Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f16_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace

tqchen commented 1 year ago

Please checkout

the latest docs will be up soon as well in https://mlc.ai/mlc-llm/docs/compilation/compile_models.html including a web build dependency instruction

tqchen commented 1 year ago

@jparismorgan do you mind checkout https://mlc.ai/mlc-llm/docs/install/emcc.html#install-web-build to see if it works for you?

jparismorgan commented 1 year ago

Awesome it worked, thank you!

(mlc-llm) ~/repo/mlc-llm ./scripts/prep_emcc_deps.sh                                                                                                                                    ✹ ✭main 
+ emcc --version
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.41-git
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ npm --version
9.5.1
+ TVM_HOME_SET=/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
+ git submodule update --init --recursive
+ [[ -z /Users/parismorgan/repo/mlc-llm/3rdparty/tvm ]]
+ cd /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web
+ make
make: Nothing to be done for `all'.
+ cd -
/Users/parismorgan/repo/mlc-llm
(mlc-llm) ~/repo/mlc-llm echo ${TVM_HOME}                                                                                                                                               ✹ ✭main 
/Users/parismorgan/repo/mlc-llm/3rdparty/tvm
(mlc-llm) ~/repo/mlc-llm  ls -l ${TVM_HOME}/web/dist/wasm/*.bc                                                                                                                          ✹ ✭main 
-rw-r--r--@ 1 parismorgan  staff   158252 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/tvmjs_support.bc
-rw-r--r--@ 1 parismorgan  staff  3713032 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/wasm_runtime.bc
-rw-r--r--@ 1 parismorgan  staff   187676 Jun 21 15:09 /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/webgpu_runtime.bc
(mlc-llm) ~/repo/mlc-llm python3 build.py --hf-path togethercomputer/RedPajama-INCITE-Chat-3B-v1 --target webgpu --quantization q4f32_0                                                 ✹ ✭main 
Weights exist at dist/models/RedPajama-INCITE-Chat-3B-v1, skipping download.
Using path "dist/models/RedPajama-INCITE-Chat-3B-v1" for model "RedPajama-INCITE-Chat-3B-v1"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
Load cached module from dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/mod_cache_before_build_webgpu.pkl and skip tracing. You can use --use-cache=0 to retrace
- Dispatch to pre-scheduled op: matmul4
- Dispatch to pre-scheduled op: fused_NT_matmul_add
- Dispatch to pre-scheduled op: softmax1
- Dispatch to pre-scheduled op: fused_NT_matmul2_add2_gelu
- Dispatch to pre-scheduled op: fused_NT_matmul_add_add1
- Dispatch to pre-scheduled op: fused_NT_matmul3_add_cast_add1
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: softmax
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: layer_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_divide1_maximum1_minimum1
- Dispatch to pre-scheduled op: fused_NT_matmul1_divide_maximum_minimum
[16:19:44] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Finish exporting to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/RedPajama-INCITE-Chat-3B-v1-q4f32_0-webgpu.wasm
Finish exporting chat config to dist/RedPajama-INCITE-Chat-3B-v1-q4f32_0/params/mlc-chat-config.json
tqchen commented 1 year ago

Thank you @jparismorgan !