mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.51k stars 870 forks source link

Loading LLM inside Electron window is very slow at the Compiling GPU Shader on Windows #621

Open StevenHanbyWilliams opened 1 day ago

StevenHanbyWilliams commented 1 day ago

Hey loving the project, really cool stuff.

Ran into an issue while trying to wrap electron around web-llm. After the model params are loaded, it seems to get stuck for several minutes (6-10) at

Loading GPU shader modules[73/74]: 98% completed, 3 secs elapsed.

I'm not seeing any error messages, and the LLM does eventually load, but its stuck there for while, even on smaller model (LLAMA 3.2 1B). We're seeing this both with our own locally served application (not using a service worker), as well as just pointing electron to chat.webllm.ai (assuming a service worker from the console logs).

We've verified that the high-performance GPU (RTX 5000) is being used by electron, both by checking navigator.gpu.requestAdapter, and by task manager. Also both our own locally served application and chat.webllm.ai work completely perfectly using standard browsers. Edge, Chrome, Brave, and Chromium 132 all load extremely fast, so we don't think is an OS/driver/hardware issue, more probably something in Electron, but I'm asking here in the hopes that someone can point me in the right direction to debug why this is happening by digging a bit deeper.

This is only on windows, mac works completely fine.

System information

Windows 11 RTX 5000 Drivers - NVIDIA 550 and 566 both showing same issue

Electron dependency versions: chrome-version : 130.0.6723.59 node-version : 20.18.0 electron-version : 33.0.2

Repro gist -

https://gist.github.com/StevenHanbyWilliams/b8bd2f41fcaef13b9f61db5be3a9e65d

WebGPUReport.org output

Screenshot 2024-10-30 175739

Iternal-JBH4 commented 1 day ago

+1 ---- $2,000 BOUNTY to the person (or split between a team) in Bitcoin to a complete solution and resolution before December 14.