ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
441 stars 21 forks source link

How to best use allow_offline? #122

Closed flatsiedatsie closed 1 month ago

flatsiedatsie commented 1 month ago

I've currently set allow_offline to true, but I'm still having trouble when WiFi is not available.

Screenshot 2024-09-29 at 18 46 54 Screenshot 2024-09-29 at 18 50 38

The needed files are all definitely in the cache:

Screenshot 2024-09-29 at 18 52 51

For good measure (read: because I wasn't sure where to put it), I add allow_offline to both the initialization of the model and the query.

window.llama_cpp_app = new Wllama(CONFIG_PATHS, {
        allow_offline:true,
        logger: {
            debug: (...args) => {
                console.debug('🔧', ...args);
                etc

and

inference_settings = {
    "n_ctx":2048,
    "temp":0,
    "allow_offline":true
}

Should I perhaps use something else than loadModelFromURL when in offline mode?

await window.llama_cpp_app.loadModelFromUrl(model_url, model_settings);
flatsiedatsie commented 1 month ago

Doh, I thought I was adding it everywhere, but I wasn't.

const outputText = await window.llama_cpp_app.createCompletion(total_prompt, {
              //nPredict: 500,
        allow_offline:true,
              sampling: {
            temp: 0 //temperature,
            //top_k: top_k,//40,
            //top_p: top_p, //0.9,
              },
        useCache: true,
              onNewToken: (token, piece, currentText, { abortSignal }) => {
            if (window.interrupt_wllama) {
                console.log("sending interrupt signal to Wllama");
                abortSignal();
                window.interrupt_wllama = false;
            }
            else{
                //console.log("wllama: onNewToken:  token,piece,currentText:", token, piece, currentText);
                let new_chunk = currentText.substr(response_so_far.length);
                window.handle_chunk(my_task,response_so_far,new_chunk);
                response_so_far = currentText;
            }
              },
});