mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
11.7k stars 727 forks source link

Fetching model param super slow on Vercel #393

Open louis030195 opened 1 month ago

louis030195 commented 1 month ago

Screenshot 2024-05-12 at 17 28 50

for some reason it takes ~3 seconds to load model locally and 30 min on @Vercel

stupid question, isn't the model downloaded on client side? so why the network would be different?

PS: similar question, any best practice to avoid loading model every time i switch page? using react hook to load model

Neet-Nestor commented 1 month ago

@louis030195

For the slow loading on Vercel, I'm not sure about the reason and I personally didn't meet the issue. We have WebLLM Chat deployed both on GitHub pages (https://chat.webllm.ai) and on Vercel (https://chat.neet.coffee/) and the loading speed are similar. Could you try it and see whether the same issue?

For you second question, there isn't a good solution since for security and performance reasons browsers typically doesn't allow website state to persist after all pages have been closes. One workaround is using ServiceWorker as WebLLM Chat does right now, which allows the WebLLM engine and its internal state to persist in worker thread even after the pages have been closed, but it is unstable as browser may still kill it at any moment.

For service worker implementation, you can reference examples/service-worker or WebLLM Chat.