mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.84k stars 890 forks source link

Is it possible to store a loaded engine in react to avoid multiple reloads when refreshing the app? #636

Open jvjmarinello opened 1 week ago

jvjmarinello commented 1 week ago

I'm working with the web-llm library to load and use models like TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k. My current implementation downloads the model successfully and caches it using engine.reload, which appears to use IndexedDB under the hood (correct me if I'm wrong). Here's the relevant code snippet:

const modelId = "TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k"; const webllm = await import("https://esm.run/@mlc-ai/web-llm"); const engine = new webllm.MLCEngine();
engine.setInitProgressCallback((report) => { console.log(Loading ${modelId}: ${report.text}); });
await engine.reload(modelId, defaultModelParameters);

Now, I want to preload this engine once and make it available across my entire React app without loading on rendering. Is that possible? how and where would you store the pre-loaded engine(s)? I was trying use hooks or context to manage this, ensuring that the engine is initialized once and shared across all components without reloading the model multiple times. I tried to store the engine in localStorage but I get an error due to circular references.

What’s the best way to achieve this in React?

stippi2 commented 3 days ago

The model is downloaded by WebLLM and stored in IndexDB or the Browser Cache under the origin of your page. When you initialize the engine, it loads your model into (V)RAM. When you say you tried to store "the engine" in localStorage, that doesn't make sense. You really need to store the engine instance within a React Context and make it available to all parts of your app using a React Hook. This is not specific to WebLLM, you should read about React Contexts. Preserving the engine across page reloads is impossible, since this will completely reset the Javascript execution context. However, you can preserve the engine across Hot Module Reloads when you develop your app and only parts get replaced. This is possible even when hot-swapping the context itself. The only other way of keeping the model loaded in RAM is by using a Shared Worker. However, a Shared Worker only lives for as long as there are tabs open that reference it.

This issue should be closed, as it doesn't really have anything to do with WebLLM, but rather how to work with React Contexts and/or HMR.

stippi2 commented 3 days ago

Also, have you noticed this section in the main readme, that provides a solution to preserve the model across page reloads? https://github.com/mlc-ai/web-llm?tab=readme-ov-file#use-service-worker