Closed jparismorgan closed 1 year ago
@jparismorgan Sorry, the gptj codebase is a little bit outdated, I'll fix it.
Thank you @yzh119! Just let me know if you have something you'd like help testing / validating, happy to give it a pass!
Hi @yzh119, thanks for the work on this! I just ran and confirmed it's working for the default build, but it looks like something is now up with the webgpu
build. Here I am building for osx okay:
(mlc-llm) ~/repo/mlc-llm python3 build.py --model gpt-j-6b --quantization q4f16_0
Using path "dist/models/gpt-j-6b" for model "gpt-j-6b"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
[09:54:01] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X
[09:54:01] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630
Host CPU dection:
Target triple: x86_64-apple-darwin22.3.0
Process triple: x86_64-apple-darwin22.3.0
Host CPU: skylake
Target configured: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
Load cached module from dist/gpt-j-6b-q4f16_0/mod_cache_before_build_metal.pkl and skip tracing. You can use --use-cache=0 to retrace
Finish exporting to dist/gpt-j-6b-q4f16_0/gpt-j-6b-q4f16_0-metal.so
Finish exporting chat config to dist/gpt-j-6b-q4f16_0/params/mlc-chat-config.json
And here is what happens when I build for webgpu
:
(mlc-llm) ~/repo/mlc-llm python3 build.py --model gpt-j-6b --quantization q4f16_0 --target webgpu ✹ ✭main
Using path "dist/models/gpt-j-6b" for model "gpt-j-6b"
Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b']
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
[12:35:04] /Users/runner/work/package/package/tvm/include/tvm/topi/transform.h:1075: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound
[12:35:05] /Users/runner/work/package/package/tvm/include/tvm/topi/transform.h:1075: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound
[12:47:51] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 555X
[12:47:51] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630
Host CPU dection:
Target triple: x86_64-apple-darwin22.3.0
Process triple: x86_64-apple-darwin22.3.0
Host CPU: skylake
Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
transformer.ln_f.weight
transformer.ln_f.bias
lm_head.weight
lm_head.bias
transformer.h.9.ln_1.weight
transformer.h.9.ln_1.bias
transformer.h.9.mlp.fc_out.weight
transformer.h.9.mlp.fc_out.bias
transformer.h.9.mlp.fc_in.weight
transformer.h.9.mlp.fc_in.bias
transformer.h.9.attn.out_proj.weight
transformer.h.9.attn.k_proj.weight
transformer.h.9.attn.v_proj.weight
transformer.h.9.attn.q_proj.weight
transformer.h.8.ln_1.weight
transformer.h.8.ln_1.bias
transformer.h.8.mlp.fc_out.weight
transformer.h.8.mlp.fc_out.bias
transformer.h.8.mlp.fc_in.weight
transformer.h.8.mlp.fc_in.bias
transformer.h.8.attn.out_proj.weight
transformer.h.8.attn.k_proj.weight
transformer.h.8.attn.v_proj.weight
transformer.h.8.attn.q_proj.weight
transformer.h.7.ln_1.weight
transformer.h.7.ln_1.bias
transformer.h.7.mlp.fc_out.weight
transformer.h.7.mlp.fc_out.bias
transformer.h.7.mlp.fc_in.weight
transformer.h.7.mlp.fc_in.bias
transformer.h.7.attn.out_proj.weight
transformer.h.7.attn.k_proj.weight
transformer.h.7.attn.v_proj.weight
transformer.h.7.attn.q_proj.weight
transformer.h.6.ln_1.weight
transformer.h.6.ln_1.bias
transformer.h.6.mlp.fc_out.weight
transformer.h.6.mlp.fc_out.bias
transformer.h.6.mlp.fc_in.weight
transformer.h.6.mlp.fc_in.bias
transformer.h.6.attn.out_proj.weight
transformer.h.6.attn.k_proj.weight
transformer.h.6.attn.v_proj.weight
transformer.h.6.attn.q_proj.weight
transformer.h.5.ln_1.weight
transformer.h.5.ln_1.bias
transformer.h.5.mlp.fc_out.weight
transformer.h.5.mlp.fc_out.bias
transformer.h.5.mlp.fc_in.weight
transformer.h.5.mlp.fc_in.bias
transformer.h.5.attn.out_proj.weight
transformer.h.5.attn.k_proj.weight
transformer.h.5.attn.v_proj.weight
transformer.h.5.attn.q_proj.weight
transformer.h.4.ln_1.weight
transformer.h.4.ln_1.bias
transformer.h.4.mlp.fc_out.weight
transformer.h.4.mlp.fc_out.bias
transformer.h.4.mlp.fc_in.weight
transformer.h.4.mlp.fc_in.bias
transformer.h.4.attn.out_proj.weight
transformer.h.4.attn.k_proj.weight
transformer.h.4.attn.v_proj.weight
transformer.h.4.attn.q_proj.weight
transformer.h.3.ln_1.weight
transformer.h.3.ln_1.bias
transformer.h.3.mlp.fc_out.weight
transformer.h.3.mlp.fc_out.bias
transformer.h.3.mlp.fc_in.weight
transformer.h.3.mlp.fc_in.bias
transformer.h.3.attn.out_proj.weight
transformer.h.3.attn.k_proj.weight
transformer.h.3.attn.v_proj.weight
transformer.h.3.attn.q_proj.weight
transformer.h.27.ln_1.weight
transformer.h.27.ln_1.bias
transformer.h.27.mlp.fc_out.weight
transformer.h.27.mlp.fc_out.bias
transformer.h.27.mlp.fc_in.weight
transformer.h.27.mlp.fc_in.bias
transformer.h.27.attn.out_proj.weight
transformer.h.27.attn.k_proj.weight
transformer.h.27.attn.v_proj.weight
transformer.h.27.attn.q_proj.weight
transformer.h.26.ln_1.weight
transformer.h.26.ln_1.bias
transformer.h.26.mlp.fc_out.weight
transformer.h.26.mlp.fc_out.bias
transformer.h.26.mlp.fc_in.weight
transformer.h.26.mlp.fc_in.bias
transformer.h.26.attn.out_proj.weight
transformer.h.26.attn.k_proj.weight
transformer.h.26.attn.v_proj.weight
transformer.h.26.attn.q_proj.weight
transformer.h.25.ln_1.weight
transformer.h.25.ln_1.bias
transformer.h.25.mlp.fc_out.weight
transformer.h.25.mlp.fc_out.bias
transformer.h.25.mlp.fc_in.weight
transformer.h.25.mlp.fc_in.bias
transformer.h.25.attn.out_proj.weight
transformer.h.25.attn.k_proj.weight
transformer.h.25.attn.v_proj.weight
transformer.h.25.attn.q_proj.weight
transformer.h.24.ln_1.weight
transformer.h.24.ln_1.bias
transformer.h.24.mlp.fc_out.weight
transformer.h.24.mlp.fc_out.bias
transformer.h.24.mlp.fc_in.weight
transformer.h.24.mlp.fc_in.bias
transformer.h.24.attn.out_proj.weight
transformer.h.24.attn.k_proj.weight
transformer.h.24.attn.v_proj.weight
transformer.h.24.attn.q_proj.weight
transformer.h.23.ln_1.weight
transformer.h.23.ln_1.bias
transformer.h.23.mlp.fc_out.weight
transformer.h.23.mlp.fc_out.bias
transformer.h.23.mlp.fc_in.weight
transformer.h.23.mlp.fc_in.bias
transformer.h.23.attn.out_proj.weight
transformer.h.23.attn.k_proj.weight
transformer.h.23.attn.v_proj.weight
transformer.h.23.attn.q_proj.weight
transformer.h.22.ln_1.weight
transformer.h.22.ln_1.bias
transformer.h.22.mlp.fc_out.weight
transformer.h.22.mlp.fc_out.bias
transformer.h.22.mlp.fc_in.weight
transformer.h.22.mlp.fc_in.bias
transformer.h.22.attn.out_proj.weight
transformer.h.22.attn.k_proj.weight
transformer.h.22.attn.v_proj.weight
transformer.h.22.attn.q_proj.weight
transformer.h.21.ln_1.weight
transformer.h.21.ln_1.bias
transformer.h.21.mlp.fc_out.weight
transformer.h.21.mlp.fc_out.bias
transformer.h.21.mlp.fc_in.weight
transformer.h.21.mlp.fc_in.bias
transformer.h.21.attn.out_proj.weight
transformer.h.21.attn.k_proj.weight
transformer.h.21.attn.v_proj.weight
transformer.h.21.attn.q_proj.weight
transformer.h.20.ln_1.weight
transformer.h.20.ln_1.bias
transformer.h.20.mlp.fc_out.weight
transformer.h.20.mlp.fc_out.bias
transformer.h.20.mlp.fc_in.weight
transformer.h.20.mlp.fc_in.bias
transformer.h.20.attn.out_proj.weight
transformer.h.20.attn.k_proj.weight
transformer.h.20.attn.v_proj.weight
transformer.h.20.attn.q_proj.weight
transformer.h.2.ln_1.weight
transformer.h.2.ln_1.bias
transformer.h.2.mlp.fc_out.weight
transformer.h.2.mlp.fc_out.bias
transformer.h.2.mlp.fc_in.weight
transformer.h.2.mlp.fc_in.bias
transformer.h.2.attn.out_proj.weight
transformer.h.2.attn.k_proj.weight
transformer.h.2.attn.v_proj.weight
transformer.h.2.attn.q_proj.weight
transformer.h.19.ln_1.weight
transformer.h.19.ln_1.bias
transformer.h.19.mlp.fc_out.weight
transformer.h.19.mlp.fc_out.bias
transformer.h.19.mlp.fc_in.weight
transformer.h.19.mlp.fc_in.bias
transformer.h.19.attn.out_proj.weight
transformer.h.19.attn.k_proj.weight
transformer.h.19.attn.v_proj.weight
transformer.h.19.attn.q_proj.weight
transformer.h.18.ln_1.weight
transformer.h.18.ln_1.bias
transformer.h.18.mlp.fc_out.weight
transformer.h.18.mlp.fc_out.bias
transformer.h.18.mlp.fc_in.weight
transformer.h.18.mlp.fc_in.bias
transformer.h.18.attn.out_proj.weight
transformer.h.18.attn.k_proj.weight
transformer.h.18.attn.v_proj.weight
transformer.h.18.attn.q_proj.weight
transformer.h.17.ln_1.weight
transformer.h.17.ln_1.bias
transformer.h.17.mlp.fc_out.weight
transformer.h.17.mlp.fc_out.bias
transformer.h.17.mlp.fc_in.weight
transformer.h.17.mlp.fc_in.bias
transformer.h.17.attn.out_proj.weight
transformer.h.17.attn.k_proj.weight
transformer.h.17.attn.v_proj.weight
transformer.h.17.attn.q_proj.weight
transformer.h.16.ln_1.weight
transformer.h.16.ln_1.bias
transformer.h.16.mlp.fc_out.weight
transformer.h.16.mlp.fc_out.bias
transformer.h.16.mlp.fc_in.weight
transformer.h.16.mlp.fc_in.bias
transformer.h.16.attn.out_proj.weight
transformer.h.16.attn.k_proj.weight
transformer.h.16.attn.v_proj.weight
transformer.h.16.attn.q_proj.weight
transformer.h.15.ln_1.weight
transformer.h.15.ln_1.bias
transformer.h.15.mlp.fc_out.weight
transformer.h.15.mlp.fc_out.bias
transformer.h.15.mlp.fc_in.weight
transformer.h.15.mlp.fc_in.bias
transformer.h.15.attn.out_proj.weight
transformer.h.15.attn.k_proj.weight
transformer.h.15.attn.v_proj.weight
transformer.h.15.attn.q_proj.weight
transformer.h.14.ln_1.weight
transformer.h.14.ln_1.bias
transformer.h.14.mlp.fc_out.weight
transformer.h.14.mlp.fc_out.bias
transformer.h.14.mlp.fc_in.weight
transformer.h.14.mlp.fc_in.bias
transformer.h.14.attn.out_proj.weight
transformer.h.14.attn.k_proj.weight
transformer.h.14.attn.v_proj.weight
transformer.h.14.attn.q_proj.weight
transformer.h.13.ln_1.weight
transformer.h.13.ln_1.bias
transformer.h.13.mlp.fc_out.weight
transformer.h.13.mlp.fc_out.bias
transformer.h.13.mlp.fc_in.weight
transformer.h.13.mlp.fc_in.bias
transformer.h.13.attn.out_proj.weight
transformer.h.13.attn.k_proj.weight
transformer.h.13.attn.v_proj.weight
transformer.h.13.attn.q_proj.weight
transformer.h.12.ln_1.weight
transformer.h.12.ln_1.bias
transformer.h.12.mlp.fc_out.weight
transformer.h.12.mlp.fc_out.bias
transformer.h.12.mlp.fc_in.weight
transformer.h.12.mlp.fc_in.bias
transformer.h.12.attn.out_proj.weight
transformer.h.12.attn.k_proj.weight
transformer.h.12.attn.v_proj.weight
transformer.h.12.attn.q_proj.weight
transformer.h.11.ln_1.weight
transformer.h.11.ln_1.bias
transformer.h.11.mlp.fc_out.weight
transformer.h.11.mlp.fc_out.bias
transformer.h.11.mlp.fc_in.weight
transformer.h.11.mlp.fc_in.bias
transformer.h.11.attn.out_proj.weight
transformer.h.11.attn.k_proj.weight
transformer.h.11.attn.v_proj.weight
transformer.h.11.attn.q_proj.weight
transformer.h.10.ln_1.weight
transformer.h.10.ln_1.bias
transformer.h.10.mlp.fc_out.weight
transformer.h.10.mlp.fc_out.bias
transformer.h.10.mlp.fc_in.weight
transformer.h.10.mlp.fc_in.bias
transformer.h.10.attn.out_proj.weight
transformer.h.10.attn.k_proj.weight
transformer.h.10.attn.v_proj.weight
transformer.h.10.attn.q_proj.weight
transformer.h.1.ln_1.weight
transformer.h.1.ln_1.bias
transformer.h.1.mlp.fc_out.weight
transformer.h.1.mlp.fc_out.bias
transformer.h.1.mlp.fc_in.weight
transformer.h.1.mlp.fc_in.bias
transformer.h.1.attn.out_proj.weight
transformer.h.1.attn.k_proj.weight
transformer.h.1.attn.v_proj.weight
transformer.h.1.attn.q_proj.weight
transformer.h.0.ln_1.weight
transformer.h.0.ln_1.bias
transformer.h.0.mlp.fc_out.weight
transformer.h.0.mlp.fc_out.bias
transformer.h.0.mlp.fc_in.weight
transformer.h.0.mlp.fc_in.bias
transformer.h.0.attn.out_proj.weight
transformer.h.0.attn.k_proj.weight
transformer.h.0.attn.v_proj.weight
transformer.h.0.attn.q_proj.weight
transformer.wte.weight
transformer.h.0.attn.bias
transformer.h.0.attn.masked_bias
/Users/parismorgan/repo/mlc-llm/mlc_llm/relax_model/gptj.py:711: RuntimeWarning: overflow encountered in cast
return [(torch_pname, raw_param.astype(dtype))]
transformer.h.1.attn.bias
transformer.h.1.attn.masked_bias
transformer.h.2.attn.bias
transformer.h.2.attn.masked_bias
transformer.h.3.attn.bias
transformer.h.3.attn.masked_bias
transformer.h.4.attn.bias
transformer.h.4.attn.masked_bias
transformer.h.5.attn.bias
transformer.h.5.attn.masked_bias
transformer.h.6.attn.bias
transformer.h.6.attn.masked_bias
transformer.h.7.attn.bias
transformer.h.7.attn.masked_bias
transformer.h.8.attn.bias
transformer.h.8.attn.masked_bias
transformer.h.9.attn.bias
transformer.h.9.attn.masked_bias
transformer.h.10.attn.bias
transformer.h.10.attn.masked_bias
transformer.h.11.attn.bias
transformer.h.11.attn.masked_bias
transformer.h.12.attn.bias
transformer.h.12.attn.masked_bias
transformer.h.13.attn.bias
transformer.h.13.attn.masked_bias
transformer.h.14.attn.bias
transformer.h.14.attn.masked_bias
transformer.h.15.attn.bias
transformer.h.15.attn.masked_bias
transformer.h.16.attn.bias
transformer.h.16.attn.masked_bias
transformer.h.17.attn.bias
transformer.h.17.attn.masked_bias
transformer.h.18.attn.bias
transformer.h.18.attn.masked_bias
transformer.h.19.attn.bias
transformer.h.19.attn.masked_bias
transformer.h.20.attn.bias
transformer.h.20.attn.masked_bias
transformer.h.21.attn.bias
transformer.h.21.attn.masked_bias
transformer.h.22.attn.bias
transformer.h.22.attn.masked_bias
transformer.h.23.attn.bias
transformer.h.23.attn.masked_bias
transformer.h.24.attn.bias
transformer.h.24.attn.masked_bias
transformer.h.25.attn.bias
transformer.h.25.attn.masked_bias
transformer.h.26.attn.bias
transformer.h.26.attn.masked_bias
transformer.h.27.attn.bias
transformer.h.27.attn.masked_bias
Finish computing and quantizing weights.
Total param size: 3.1714653372764587 GB
Start storing to cache dist/gpt-j-6b-q4f16_0/params
[0455/0455] saving param_454
All finished, 101 total shards committed, record saved to dist/gpt-j-6b-q4f16_0/params/ndarray-cache.json
Save a cached module to dist/gpt-j-6b-q4f16_0/mod_cache_before_build_webgpu.pkl.
[13:05:13] /Users/runner/work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Traceback (most recent call last):
File "/Users/parismorgan/repo/mlc-llm/build.py", line 470, in <module>
main()
File "/Users/parismorgan/repo/mlc-llm/build.py", line 462, in main
build(mod, ARGS)
File "/Users/parismorgan/repo/mlc-llm/build.py", line 412, in build
ex.export_library(lib_path, **args.export_kwargs)
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/relax/vm_build.py", line 147, in export_library
return self.mod.export_library(
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/runtime/module.py", line 598, in export_library
return fcompile(file_name, files, **kwargs)
File "/Users/parismorgan/virtualenvs/mlc-llm/lib/python3.9/site-packages/tvm/contrib/emcc.py", line 79, in create_tvmjs_wasm
raise RuntimeError(msg)
RuntimeError: Compilation error:
wasm-ld: error: initial memory too small, 298422128 bytes needed
emcc: error: '/usr/local/Cellar/emscripten/3.1.41/libexec/llvm/bin/wasm-ld -o dist/gpt-j-6b-q4f16_0/gpt-j-6b-q4f16_0-webgpu.wasm /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/wasm_runtime.bc /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/tvmjs_support.bc /Users/parismorgan/repo/mlc-llm/3rdparty/tvm/web/dist/wasm/webgpu_runtime.bc /var/folders/89/tw4l36q54g9bt_q0pzh8m36m0000gn/T/tmprguxn92d/lib0.o /var/folders/89/tw4l36q54g9bt_q0pzh8m36m0000gn/T/tmprguxn92d/devc.o -L/usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot/lib/wasm32-emscripten /usr/local/Cellar/emscripten/3.1.41/libexec/cache/sysroot/lib/wasm32-emscripten/crt1_reactor.o -lGL -lal -lhtml5 -lstandalonewasm-nocatch-memgrow -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /var/folders/89/tw4l36q54g9bt_q0pzh8m36m0000gn/T/tmpr6ci9ki4libemscripten_js_symbols.so --import-undefined --strip-debug --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=stackSave --export-if-defined=stackRestore --export-if-defined=stackAlloc --export-if-defined=__errno_location --export-table -z stack-size=65536 --initial-memory=37748736 --entry=_initialize --max-memory=2147483648 --global-base=1024' failed (returned 1)
It looks like this post explains a fix: https://stackoverflow.com/a/66069665/4979029 - but I haven't yet seen where to set that flag - perhaps you know?
Also please let me know if you'd like me to create a new bug for this - thank you!
I believe the first error above is still happening, but it is not a blocker, and it is slightly different than the original bug I filed, so I'm going to close this issue and open a new one if I need to in the future. Thanks for your help here!
🐛 Bug
Based on the documentation it seems that
gptj
should be a supported model architecture. But when I try to build https://huggingface.co/EleutherAI/gpt-j-6b for WebGPU I getAssertionError: Model type gptj not supported.
To Reproduce
Steps to reproduce the behavior:
python3 build.py --hf-path EleutherAI/gpt-j-6b --target webgpu --quantization q4f32_0
and get error:supported_model_types = set(["llama", "gpt_neox", "moss", "rwkv"])
, i.e.supported_model_types = set(["llama", "gpt_neox", "moss", "rwkv", "gptj"])
This is because it's not a supported model prefix:
I stopped here because I wasn't sure if just adding in something made sense.
Expected behavior
Perhaps the docs could use an update on what is supported when building for WebGPU? Or am I just misunderstanding how things work and they are clear? Or perhaps it is expected that this would work and it is a bug?
Environment
conda
, source): Cloned https://github.com/mlc-ai/mlc-llmpip
, source):pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context
Please let me know if there is something I missed in the docs about this, thank you for any help!