mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
12.38k stars 781 forks source link

wasm optimization? #494

Open 137591 opened 2 months ago

137591 commented 2 months ago

When I use the web-llm instance (path: /web-llm/examples/simple-chat), and observe the source file (@mlc-ai/web-llm/lib/index.js), I notice that there is a lot of interaction with wasm files, which makes reading the source code somewhat difficult. I would be very grateful if you could inform me of the logical content of all the wasm files! Additionally, I have observed that there seems to be room for optimization in the implementation of model files (for example: "model_lib_url": modelLibURLPrefix + modelVersion + "/Llama-3-8B-Instruct-q4f32_1-ctx1k_cs1k-webgpu.wasm"). May I inquire if I should optimize through modifying the TVM compilation process?

CharlieFRuan commented 2 months ago

Thanks for the question! The wasm is composed of various parts, including the kernel of the model (in WGSL), and runtime support (C++ code compiled into WASM).