wasm optimization？ - Githubissues

mlc-ai / web-llm

High-performance In-browser LLM Inference Engine

Apache License 2.0

13.52k stars 870 forks source link

Thanks for the question! The wasm is composed of various parts, including the kernel of the model (in WGSL), and runtime support (C++ code compiled into WASM).

The kernel is implemented in MLC-LLM and compiled to WGSL: https://llm.mlc.ai/docs/deploy/webllm.html#bring-your-own-model-library
Runtime support from MLC-LLM: https://github.com/mlc-ai/mlc-llm/blob/main/web/emcc/mlc_wasm_runtime.cc
Runtime support from TVM (one of the three files): https://github.com/apache/tvm/blob/main/web/emcc/wasm_runtime.cc
The kernel, the runtime support (compiled into .bc) are then linked together to form the final .wasm file: https://github.com/apache/tvm/blob/main/python/tvm/contrib/emcc.py

mlc-ai / web-llm

wasm optimization？ #494