xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
9.71k stars 571 forks source link

v3: Issue loading T5-Small with webgpu backend #773

Open r4ghu opened 1 month ago

r4ghu commented 1 month ago

System Info

Environment/Platform

Description

Hi, First of all thank you for the awesome work you are doing with transformers.js. It reduced my entry barrier to develop ML models for web applications and extensions. I am currently working on implementing a T5-small based text generation example and I am able to get it working with the following configuration -

When I try to modify the code to run with v3, I am facing the following issue when device set to webgpu and dtype as fp32-

@xenova_transformers.js?v=9f46ee43:3066 Uncaught (in promise) TypeError: Nt[l] is not a function
    at I (@xenova_transformers.js?v=9f46ee43:3066:57)
    at ort-wasm-simd.jsep.wasm:0xf9c7c9
    at ort-wasm-simd.jsep.wasm:0x25ad3
    at ort-wasm-simd.jsep.wasm:0x647371
    at ort-wasm-simd.jsep.wasm:0x20c390
    at ort-wasm-simd.jsep.wasm:0xef8bbe
    at ort-wasm-simd.jsep.wasm:0xb4611e
    at f.<computed> (@xenova_transformers.js?v=9f46ee43:2685:22)
    at r._OrtCreateSession (@xenova_transformers.js?v=9f46ee43:3141:678)
    at Object.<anonymous> (@xenova_transformers.js?v=9f46ee43:2118:15)
I @ @xenova_transformers.js?v=9f46ee43:3066
$func11708 @ ort-wasm-simd.jsep.wasm:0xf9c7c9
$func205 @ ort-wasm-simd.jsep.wasm:0x25ad3
$func4462 @ ort-wasm-simd.jsep.wasm:0x647371
$func2008 @ ort-wasm-simd.jsep.wasm:0x20c390
$func10915 @ ort-wasm-simd.jsep.wasm:0xef8bbe
$ta @ ort-wasm-simd.jsep.wasm:0xb4611e
f.<computed> @ @xenova_transformers.js?v=9f46ee43:2685
r._OrtCreateSession @ @xenova_transformers.js?v=9f46ee43:3141
(anonymous) @ @xenova_transformers.js?v=9f46ee43:2118
Pd @ @xenova_transformers.js?v=9f46ee43:11193
Hd @ @xenova_transformers.js?v=9f46ee43:11463
loadModel @ @xenova_transformers.js?v=9f46ee43:11534
createInferenceSessionHandler @ @xenova_transformers.js?v=9f46ee43:11586
create @ @xenova_transformers.js?v=9f46ee43:1901

I am pretty sure Xenova/t5-small will run with transformers.js as I tested its performance using - WebGPU Embedding Benchmark

Any suggestions on how to implement the model loading and inference for T5-Small model will be really helpful. Thank you for your time in this matter.

Reproduction

npm install xenova/transformers.js#v3

Code to load the model

class GeneratorSingleton {
    static model_id = 'Xenova/t5-small';
    static model = null;
    static tokenizer = null;

    static async getInstance(progress_callback = null) {
        if (!this.tokenizer) {
            this.tokenizer = AutoTokenizer.from_pretrained(this.model_id);
        }

        if (!this.model) {
            this.model = AutoModelForSeq2SeqLM.from_pretrained(this.model_id, {
                dtype: 'fp32',
                device: 'webgpu',
                progress_callback,
            });
        }

        progress_callback({ status: 'ready' });

        return Promise.all([this.tokenizer, this.model]);
    }
}
Th3G33k commented 1 month ago

If you specify a progress_callback, it cannot be null, and should be a function.

// default empty function
static async getInstance(progress_callback = ()=>{}) {}

// default console log
static async getInstance(progress_callback = (x)=>console.log(x)) {}
r4ghu commented 3 weeks ago

Update - I got this working by explicitly setting the env.backends.onnx.wasm.wasmPaths before initializing the model. Updated code that worked for me -

class GeneratorSingleton {
    static model_id = 'Xenova/t5-small';
    static model = null;
    static tokenizer = null;

    static async getInstance(progress_callback = (x) => console.log(x)) {
        if (!this.tokenizer) {
            this.tokenizer = AutoTokenizer.from_pretrained(this.model_id);
        }

        if (!this.model) {
            env.backends.onnx.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0-dev.20240521-068bb3d5ee/dist/';
            env.backends.onnx.wasm.numThreads = 1;

            this.model = AutoModelForSeq2SeqLM.from_pretrained(this.model_id, {
                dtype: 'fp32',
                device: 'webgpu',
                progress_callback,
            });
        }

        progress_callback({ status: 'ready' });

        return Promise.all([this.tokenizer, this.model]);
    }
}