Open flatsiedatsie opened 1 month ago
I just noticed something while doing mobile debugging. I couldn't figure out why the smallest model (Danube 3 500m, 320MB) wasn't working. According to my debug information the Wllama object was null (which it normally gets set to after calling exit() succesfully.
On mobile the memory debugging was working, so I noticed that Wllama's multi-thread workers still seemed to exist?
Maybe I should never set the Wllama object back to null.
Is there another sure-fireway to fully destroy wllama? The UI allows users to load all kinds of models, some of which are handled by Wllama, but others are handled by WebLLM or even Transformers.js. I try to juggle these 'runners' memory-wise, only allowing one of them to exist at a time.
At least, that was the theory...
Maybe I should never set the Wllama object back to null.
You can set it to a new wllama instance instead of setting to null
You can set it to a new wllama instance instead of setting to null
Thanks. Will that kill the workers and unload the model to release the memory properly?
I just noticed that resetWllamaInstance
effectively does what you describe.
const resetWllamaInstance = () => {
wllamaInstance = new Wllama(WLLAMA_CONFIG_PATHS, { logger: DebugLogger });
};
For now I've modified the code to no longer null
the Wllama instance, and just unload the model when it's WebLLM's turn. Or could that result in memory not being fully recovered?
I'm running into a situation where await wllama.exit()
is stuck. The code (similar to that in the first post) doesn't get beyond it.
I'm trying to unload the old model before loading a new one.
if(typeof window.llama_cpp_app.isModelLoaded != 'undefined'){
let a_model_is_loaded = await window.llama_cpp_app.isModelLoaded();
console.warn("WLLAMA: need to unload a model first?: ", a_model_is_loaded, window.llama_cpp_app);
if(a_model_is_loaded && typeof window.llama_cpp_app.unloadModel != 'undefined'){
console.log("wllama: unloading loaded model first. window.llama_cpp_app: ", window.llama_cpp_app);
await window.llama_cpp_app.unloadModel();
}
else if(a_model_is_loaded && typeof window.llama_cpp_app.exit != 'undefined'){
console.error("wllama: unloading loaded model first by calling exit instead of unloadModel. window.llama_cpp_app: ", window.llama_cpp_app);
await window.llama_cpp_app.exit();
console.log("wllama exited. window.llama_cpp_app is now: ", window.llama_cpp_app);
}
else if(a_model_is_loaded){
console.error("WLLAMA HAS A MODEL LOADED, BUT NO WAY TO UNLOAD IT? window.llama_cpp_app: ", window.llama_cpp_app);
return false;
}
create_wllama_object(); // TODO: potential memory leak if the old model isn't unloaded properly first
}
else{
console.error("llama_cpp_add has no isModelLoaded: ", window.llama_cpp_app);
}
This happens: wllama: unloading loaded model first by calling exit instead of unloadModel
But I never see wllama exited.
The reason I ask it because I've read that Mobile Safari doesn't clean up orphaned web workers properly.
I'm now attempting this:
if(typeof window.llama_cpp_app.proxy != 'undefined' && window.llama_cpp_app.proxy != null && typeof window.llama_cpp_app.proxy.worker != 'undefined'){
console.warn("wllama.proxy still existed, attempting to terminate it manually");
window.llama_cpp_app.proxy.worker.terminate();
}
Calling window.llama_cpp_app.proxy.worker.terminate();
has been working well for now.
I'll leave this issue open because I'm curious what the recommended route for unloading models is, and how memory can be optimally recovered while keeping an instance of Wllama alive for housekeeping tasks.
I verify that the exit function exists, but calling it results in the error above.