huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.36k stars 708 forks source link

Zombies in memory - something is blocking (re)loading of Whisper after a page is closed and re-opened #958

Closed flatsiedatsie closed 1 week ago

flatsiedatsie commented 2 weeks ago

Question

I've been trying to debug this issue all afternoon, but haven't gotten any further. The code runs on desktop, but not on Android Chrome.

This is with V3 Alpha 19.

Screenshot 2024-10-02 at 16 06 16 Screenshot 2024-10-02 at 16 06 40 Screenshot 2024-10-02 at 16 06 56
flatsiedatsie commented 2 weeks ago

The demo on HuggingFace does work on the phone... hmm. I'll have to dive deeper.

flatsiedatsie commented 1 week ago

I did a rewrite to more closely follow the recent examples in the hopes that that would fix the issue. But after all that, I still get the same type of error. It still only occurs on mobile (Android Chrome), everything runs great on desktop. I've even implemented the option to go turbo while I was at it.

But since the demo did run on the phone, it must be my code/situation.

Things I've tried:

Some aveneus I could still explore, simply out of desperation:

Screenshot 2024-10-05 at 02 16 13 Screenshot 2024-10-05 at 02 14 06 Screenshot 2024-10-05 at 02 13 09 Screenshot 2024-10-05 at 02 16 13

The errors imply a memory issue. But at this point Whisper is the only AI running.

My next step is to ceate a minimal viable example to call the worker, to see if I can rule out interference with another library.

flatsiedatsie commented 1 week ago

Some small questions:

In some example code I noticed

export default {
    DEFAULT_LANGUAGE: "english",
    (etc)
};

Yet in the rest of the code the language is always set with a two-letter code?

I've also seen the code being set, yet sometimes get errors saying that for english-only models, the language should not be explicitly set.

flatsiedatsie commented 1 week ago

I created the minimal test case, and the error still exists. I also noticed something: Transformers.js seems to try and claim memory before the model has fully loaded:

Screenshot 2024-10-05 at 11 44 12
flatsiedatsie commented 1 week ago

Hmm, I did a full cache clean on the phone again and made some modifications. The result on mobile now:

Screenshot 2024-10-05 at 11 53 58
flatsiedatsie commented 1 week ago

I think I may have figured it out!

I was trying to get it to use less memory, and enabling quantization. But this was in fact the problem.

flatsiedatsie commented 1 week ago

Turns out it wasn't that simple :-(

I could get it to run in the minimal variant, but it won't run properly when in the larger whole.

Also, it seems as if something remains in memory that blocks the creation of a new instance, even after the page has been closed. If I manually force-kill Brave's GPU process and the render process, then it gets reset, and I can create a new instance. Otherwise the getInstance promise never gets resolved.

flatsiedatsie commented 1 week ago

Hardcoding the settings seems to have a positive effect 0_0

    if(self.device == 'webgpu'){
              this.instance = pipeline(this.task, this.model_id, {
                  "dtype": {
                      "encoder_model": "fp32",
                      "decoder_model_merged": "q4"
                  },
                  "device": "webgpu",
                  progress_callback
        });
    }
    else{
              this.instance = pipeline(this.task, this.model_id, {
                  "dtype": "q8",
                  "device": "wasm",
                  progress_callback
        });
    }
flatsiedatsie commented 1 week ago

Strange, even WASM fails.

Screenshot 2024-10-05 at 18 01 21
flatsiedatsie commented 1 week ago

I tried reverting back to Alpha 15, but the crash still occurs. Which once again points to my code..

flatsiedatsie commented 1 week ago

I discovered a pattern:

So it really seems that 'something' is remaining alive after I close the tab.

I then tried to add this to the main code:

window.onbeforeunload = function() {
    console.log("BEFORE UNLOAD");
    if(window.whisper_worker != null){
        console.log("BEFORE UNLOAD:terminating whisper worker");
        window.whisper_worker.terminate();
    }
    return '';
};

To see if I could quickly kill the Whisper Worker when a tab is closed. But that did not seem to have 'unblocked' things when I create a new tab.

flatsiedatsie commented 1 week ago
import { 
    pipeline, 
    env, 
    AutoTokenizer,
    AutoProcessor, 
    AutoModel, 
    AutoModelForAudioFrameClassification,
    WhisperTextStreamer,
    WhisperForConditionalGeneration,
    full,
} from './tjs/transformers.min.js';

const MAX_NEW_TOKENS = 64;

env.allowLocalModels = false;
env.allowRemoteModels = true;
env.useBrowserCache = true;

self.device = 'webgpu';
env.backends.onnx.wasm.proxy = false;

class PipelineFactory {
    static task = null; //"automatic-speech-recognition";
    static model = null; //'onnx-community/whisper-small.en_timestamped';
    static instance = null;

    static model_id = 'onnx-community/whisper-small.en_timestamped';

    constructor(tokenizer, model, quantized) {
        //console.log("in pipelineFactory constructor.  tokenizer, model, quantized: ", tokenizer, model, quantized);
        //console.log("pipelineFactory: in constructor");
        this.tokenizer = tokenizer;
        this.model = model;
    }

    static instance_exists(){
        console.log("returning if instance exists");
        return this.instance != null;
    }

    static set_to_null(var_to_null=null) {
        if(typeof var_to_null == 'string' && typeof this[var_to_null] != 'undefined'){
            this[var_to_null] = null;
            console.log("ASR PipelineFactory: set_to_null: ", var_to_null);
        }
    }

static async getInstance(progress_callback=null, model_id='onnx-community/whisper-small.en_timestamped') {
        console.log("ASR: getInstance: model_id: ", model_id);

        this.model = model_id;
        this.model_id = model_id;

        console.log("\n\npipelineFactory: getInstance");
        console.log("- this.task: ", this.task);
        console.log("- this.model_id: ", this.model_id);
        console.log("- this.model: ", this.model);
        console.log("- self.device: ", self.device);

        if (this.instance === null) {
                console.log("PipelineFactory: this.instance was null, creating pipeline promise");

                if(self.device == 'webgpu'){
                      this.instance = pipeline(this.task, this.model_id, {
                    "dtype": {
                        "encoder_model": "fp32",
                        "decoder_model_merged": "q4" // "fp32"
                    },
                    "device": "webgpu",
                    progress_callback
                });
            }
            else{
                    this.instance = pipeline(this.task, this.model_id, {
                    "dtype": "q8",
                    "device": "wasm",
                    progress_callback
                });
            }

        }
        else{
            console.log("ASR pipeline getInstance: this.instance already existed");
        }

        //console.log("PipelineFactory: returning this.instance: ", this.instance);
        return this.instance;
    }
}

class AutomaticSpeechRecognitionPipelineFactory extends PipelineFactory {
    static task = "automatic-speech-recognition";
    static model = null;
    static quantized = null;
}

const transcribo = async (message,preload=false) => {
    console.log("whisper_worker: in new transcribo function.  message,preload: ", message, preload);

    // Storage for chunks to be processed. Initialise with an empty chunk.
    const chunks = [];
    let output = null;
    let tps;

    try{

        if(typeof message.model != 'string'){
            console.error("transcribe: message.model was not a string!");
            return null;
        }
        console.log("transcribo: message.model: ", message.model);
        self.current_asr_model_id = message.model;

        if(typeof message.options == 'undefined'){
            console.error("transcribe: message.options was undefined!");
            return null;
        }

        let asr_options = JSON.parse(JSON.stringify(message.options));

        console.log("transcribe: initial asr_options: ", asr_options);

        /*
        let asr_options = {
            // Greedy
            top_k: 0,
            do_sample: false,

            // Sliding window
            chunk_length_s:20,
            stride_length_s:3,

            // Language and task
            //language:'en',
            //language:'english',
            //task: "transcribe",

            // Return timestamps
            return_timestamps: 'word',
            force_full_sequences: false,

            // Callback functions
            //streamer, // after each generation step
        }
        */

        const p = AutomaticSpeechRecognitionPipelineFactory;

        if (p.model !== message.model){

            // Invalidate model if different
            console.warn("whisper_worker: need to load a new ASR model: ", message.model);
            p.model = message.model;

            if (p.instance !== null) {
                console.log("whisper_worker: disposing of old ASR instance first");
                (await p.getInstance()).dispose();
                p.instance = null;
            }
        }

        // Load transcribot model
        const transcribot = await p.getInstance((data) => {
            //console.log("whisper_worker: transcribot: got data: ", data);
            self.postMessage(data);
        }, message.model);

        console.warn("\n\nHURRAY, GOT BEYOND TRANSCRIBOT CREATION\n\n");

        //console.log("transcribot loaded?: ", transcribot);
        //console.log("transcribot model: ", transcribot.tokenizer);
        //console.log("transcribot model: ", transcribot.model);
        //console.log("transcribot processor: ", transcribot.processor);

        if(preload){
            /*
            if(self.device == 'webgpu' && typeof transcribot.model == 'object' && transcribot.model != null && typeof transcribot.model.generate === 'function'){
                console.log("transcribot: preloading: attempting to warm-up the transcribot model (transcribot.model.generate is a function)");
                self.postMessage({
                    status: 'asr_warming_up',
                    data: 'Compiling shaders and warming up model...'
                });

                // Run model with dummy input to compile shaders. Only needed if running via WebGPU
                await transcribot.model.generate({
                    input_features: full([1, 80, 3000], 0.0),
                    max_new_tokens: 1,
                });
            }
            */
            console.warn("transcribe: ending early because this was a preload run");
            return true
        }

        if(typeof message.task == 'undefined' || message.task == null || typeof message.task.recorded_audio == 'undefined'){
            console.error("transcribo: NO AUDIO!");
            return null;
        }

        const time_precision =
            transcribot.processor.feature_extractor.config.chunk_length /
            transcribot.model.config.max_source_positions;

        console.log("transcribo: time_precision: ", time_precision);

        // TODO: Storage for fully-processed and merged chunks
        // let decoded_chunks = [];

        let chunk_count = 0;
        let start_time;
        let num_tokens = 0;

        console.log("creating streamer next. transcribot.tokenizer: ", transcribot.tokenizer);

            const streamer = new WhisperTextStreamer(transcribot.tokenizer, {
                time_precision,
                on_chunk_start: (x) => {
                    const offset = (asr_options['chunk_length_s'] - asr_options['stride_length_s']) * chunk_count;
                    chunks.push({
                        text: "",
                        timestamp: [offset + x, null],
                        finalised: false,
                        offset,
                    });
                },
                token_callback_function: (x) => {
                    start_time ??= performance.now();
                    if (num_tokens++ > 0) {
                        tps = (num_tokens / (performance.now() - start_time)) * 1000;
                    }
                },
                callback_function: (x) => {
                    if (chunks.length === 0) return;
                    // Append text to the last chunk
                    chunks.at(-1).text += x;

                    self.postMessage({
                        status: "asr_update",
                        data: {
                            text: "", // No need to send full text yet
                            chunks,
                            tps,
                        },
                    });
                },
                on_chunk_end: (x) => {
                    const current = chunks.at(-1);
                    current.timestamp[1] = x + current.offset;
                    current.finalised = true;
                },
                on_finalize: () => {
                    start_time = null;
                    num_tokens = 0;
                    ++chunk_count;
                },
            });
            asr_options['streamer'] = streamer;

        console.log("asr_options: ", JSON.stringify(asr_options,null,4));

        self.postMessage({ status: 'pipeline_ready' });

        console.error("\n\n\nOK\n\n\n\nWHISPER: AUDIO LENGTH: ", message.task.recorded_audio.length);
        //console.error("WHISPER AUDIO: ", message.task.recorded_audio);

        // Actually run transcription
        output = await transcribot(message.task.recorded_audio, asr_options).catch((error) => {
            console.error("caught error in transcribot: ", error);
            self.postMessage({
                status: "error",
                data: error,
            });
            return null;
        });

        console.log("whisper_worker: RAW ASR output: ", output);

    }
    catch(err){
        console.error("caught error in transcribe: ", err);
    }

    return {
        tps,
        ...output,
        chunks,
    };
};
flatsiedatsie commented 1 week ago

Hmm, on desktop, the minimal test version does run every time without issue. But only if I don't also have another tab open with a frozen Whisper Worker.

So a frozen Whisper Worker in tab A also blocks loading whisper in a new Whisper Worker in tab B.

This would imply that something in my project's code is creating the condition that freezes up the loading process in such a hardcore way that even other tabs are affected.

// Hmm, but everything works fine the first time, when the browser is fresh (or after manually killing the related processes... or killing the worker and then restarting it again...). So that would imply something gets set on the first load of Whisper that interferes with itself after a page reload.

But that issue doesn't happen in the minimal test, so something in my code/situation is causing Whisper's worker to not properly die/unload/get cleaned after the page is closed.

// The minimal version immediately crashes on mobile.

Screenshot 2024-10-06 at 15 17 07
flatsiedatsie commented 1 week ago

By Jove, I think I may have cracked the case!

flatsiedatsie commented 1 week ago

I've solved one of the issues.

It turns out that I was sending data to the Whisper Worker right after it was created. But the worker wasn't actually 'loaded in' at that point yet.

So I've added this to the end of the worker script:

self.postMessage({
    status: "exists"
});

Only once the main script has received the "exists" message will it send the worker the data. So, it was an issue in my code (as was the most likely) that slipped in during the rewrite.

The delay has solved things on the desktop side. It now works perfectly again.

On the mobile side, however, this fix unfortunately hasn't resolved the issue. There I still see:

Screenshot 2024-10-07 at 11 57 34

For now I'm assuming that's an indication of a 'real' out of memory issue. And now I'm back to trying (adding) variations of enabling quantized and/or changing Q4 to FP32 for the encoder.

It was working fine on mobile for a long time, so I know it's possible.

flatsiedatsie commented 1 week ago

and

Screenshot 2024-10-07 at 14 49 30
flatsiedatsie commented 1 week ago

Tested on another Android device, a Samsung tablet with just 2GB of RAM.

Also tested on an iPhone SE 2020, with 3GB ram.

flatsiedatsie commented 1 week ago

Since it seems to be a combinations of an issue in my code and -seemingly- an issue with mobile Chrome I'm going to close this issue.

Phew!

flatsiedatsie commented 6 hours ago

I've noticed that in general it's a good idea to create a 'preload' function, where the model first downloads its files, and after that is done, sends a "preloaded" mesage back to the main thread, and only then gets sent the actual tasks.

I don't know why this is. It could be that it adds a small delay? But I've now added this to the TTS worker too, and it seems much happier.

Still, sporadically a worker will still freeze. I've resorted to adding code that checks if it has taken more than 15 seconds for the worker to create an instance. If the main thread doesn't get a success mesage within that timeframe, the worker will be terminated. This.. isn't pretty.