ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
444 stars 23 forks source link

calling .exit() has unexpected result: wllamaExit is not a function #121

Open flatsiedatsie opened 1 month ago

flatsiedatsie commented 1 month ago
Screenshot 2024-09-29 at 14 29 21

I verify that the exit function exists, but calling it results in the error above.

try{
    if(window.llama_cpp_model_being_loaded){
        if(typeof window.llama_cpp_app.unloadModel === 'function'){
            await window.llama_cpp_app.unloadModel();
        }else{
            console.error("window.llama_cpp_app was not null, but had no unloadModel function?  window.llama_cpp_app: ", window.llama_cpp_app);
        }

    }
    else{
        if(typeof window.llama_cpp_app.exit === 'function'){
            await window.llama_cpp_app.exit();
        }else{
            console.error("window.llama_cpp_app was not null, but had no exit function?  window.llama_cpp_app: ", window.llama_cpp_app);
        }
    }
}
catch(err){
    console.error("caught error trying to stop/unload Wllama: ", err);
}
flatsiedatsie commented 1 month ago

I just noticed something while doing mobile debugging. I couldn't figure out why the smallest model (Danube 3 500m, 320MB) wasn't working. According to my debug information the Wllama object was null (which it normally gets set to after calling exit() succesfully.

On mobile the memory debugging was working, so I noticed that Wllama's multi-thread workers still seemed to exist?

Screenshot 2024-09-29 at 14 58 36

Maybe I should never set the Wllama object back to null.

Is there another sure-fireway to fully destroy wllama? The UI allows users to load all kinds of models, some of which are handled by Wllama, but others are handled by WebLLM or even Transformers.js. I try to juggle these 'runners' memory-wise, only allowing one of them to exist at a time.

At least, that was the theory...

ngxson commented 1 month ago

Maybe I should never set the Wllama object back to null.

You can set it to a new wllama instance instead of setting to null

flatsiedatsie commented 1 month ago

You can set it to a new wllama instance instead of setting to null

Thanks. Will that kill the workers and unload the model to release the memory properly?

I just noticed that resetWllamaInstance effectively does what you describe.

const resetWllamaInstance = () => {
  wllamaInstance = new Wllama(WLLAMA_CONFIG_PATHS, { logger: DebugLogger });
};

For now I've modified the code to no longer null the Wllama instance, and just unload the model when it's WebLLM's turn. Or could that result in memory not being fully recovered?

flatsiedatsie commented 1 month ago

I'm running into a situation where await wllama.exit() is stuck. The code (similar to that in the first post) doesn't get beyond it.

Screenshot 2024-10-01 at 15 53 31

I'm trying to unload the old model before loading a new one.

if(typeof window.llama_cpp_app.isModelLoaded != 'undefined'){
    let a_model_is_loaded = await window.llama_cpp_app.isModelLoaded();
    console.warn("WLLAMA: need to unload a model first?: ", a_model_is_loaded, window.llama_cpp_app);
    if(a_model_is_loaded && typeof window.llama_cpp_app.unloadModel != 'undefined'){
        console.log("wllama: unloading loaded model first.  window.llama_cpp_app: ", window.llama_cpp_app);
        await window.llama_cpp_app.unloadModel();
    }
    else if(a_model_is_loaded && typeof window.llama_cpp_app.exit != 'undefined'){
        console.error("wllama: unloading loaded model first by calling exit instead of unloadModel.  window.llama_cpp_app: ", window.llama_cpp_app);
        await window.llama_cpp_app.exit();
        console.log("wllama exited.  window.llama_cpp_app is now: ", window.llama_cpp_app);
    }
    else if(a_model_is_loaded){
        console.error("WLLAMA HAS A MODEL LOADED, BUT NO WAY TO UNLOAD IT?  window.llama_cpp_app: ", window.llama_cpp_app);
        return false;
    }
    create_wllama_object(); // TODO: potential memory leak if the old model isn't unloaded properly first
}
else{
    console.error("llama_cpp_add has no isModelLoaded: ", window.llama_cpp_app);
}

This happens: wllama: unloading loaded model first by calling exit instead of unloadModel But I never see wllama exited.

flatsiedatsie commented 1 month ago

The reason I ask it because I've read that Mobile Safari doesn't clean up orphaned web workers properly.

I'm now attempting this:

if(typeof window.llama_cpp_app.proxy != 'undefined' && window.llama_cpp_app.proxy != null && typeof window.llama_cpp_app.proxy.worker != 'undefined'){
        console.warn("wllama.proxy still existed, attempting to terminate it manually");
        window.llama_cpp_app.proxy.worker.terminate();
}
flatsiedatsie commented 1 month ago

Calling window.llama_cpp_app.proxy.worker.terminate(); has been working well for now.

I'll leave this issue open because I'm curious what the recommended route for unloading models is, and how memory can be optimally recovered while keeping an instance of Wllama alive for housekeeping tasks.