xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
9.71k stars 571 forks source link

No performance gain on using WebGPU #796

Closed mr-sarthakgupta closed 2 weeks ago

mr-sarthakgupta commented 2 weeks ago

Question

I want to use the model: https://huggingface.co/Xenova/clip-vit-large-patch14 with WebGPU for quick inference in the browser. I ran the WebGPU benchmark to observe the performance increase and indeed it showed a ~7x improvement in speed on my device.

But when I run the clip model linked above, there's barely any difference between performance with and without WebGPU.

xenova commented 2 weeks ago

Can you include the code you are running? You may need to update the dtype to one which produces better results.

mr-sarthakgupta commented 2 weeks ago

I tried the following ways:

const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-large-patch14');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-large-patch14', {
    device: 'webgpu',
    dtype: 'fp16',
});

const image = await RawImage.read(url);
const time_start = performance.now();
const image_inputs = await processor(image);

const { image_embeds }  = await vision_model(image_inputs);

This took 17371 ms with fp16 and 17353 ms with fp32.

Second method I tried was:

const vision_model = await pipeline('image-feature-extraction', 'Xenova/clip-vit-large-patch14', {
    device: 'webgpu',
    dtype: 'fp16',
});

const image_embeds = await vision_model(url);

At fp16 this took 17549 ms and at fp32 it took 16576 ms.

while without the webgpu using:

const vision_model = await pipeline('image-feature-extraction', 'Xenova/clip-vit-large-patch14', {
    dtype: 'fp16',
});

const image_embeds = await vision_model(url);

I got the forward pass in 16753 ms.

Even for batch size=1 I obtained huge improvement on speed using my device:

benchmark
xenova commented 2 weeks ago

Hmm, strange. Your code looks right. 🤔

Could you try this demo: https://huggingface.co/spaces/Xenova/webgpu-clip? It should be real-time CLIP with WebGPU. You can also try a smaller CLIP model like https://huggingface.co/Xenova/clip-vit-base-patch32, maybe the large one has some issues with the ONNX export.

https://github.com/xenova/transformers.js/assets/26504141/75a4ab6f-41f2-4a00-9967-3cd7dcaa801e

mr-sarthakgupta commented 2 weeks ago

Unfortunately I need the projection dimensions to be 768 which is only true for the large model. It's really strange indeed, the demo is working perfectly fine on my device too having ~5 FPS.

Would it be possible to try the large model in the demo to see if it's an issue specific to the large model? Also, which model does this demo use?

Edit: Just tried the model https://huggingface.co/Xenova/clip-vit-base-patch32 and found the same trend, no significant change in inference speed with or without the WebGPU

mr-sarthakgupta commented 2 weeks ago

Hi @xenova could it have something to do with the fact that I'm using the model in a browser-extension? Also, would it be possible for you to provide the code for the webgpu-clip demo?

xenova commented 2 weeks ago

could it have something to do with the fact that I'm using the model in a browser-extension?

Hmm, good question. Just to confirm, are you sure you've installed Transformers.js v3 from the dev branch with:

npm install xenova/transformers.js#v3

? You might still be using v2.

Also, would it be possible for you to provide the code for the webgpu-clip demo?

Sure - here's the source code: https://github.com/xenova/transformers.js/tree/v3/examples/webgpu-clip

mr-sarthakgupta commented 2 weeks ago

Hmm, good question. Just to confirm, are you sure you've installed Transformers.js v3 from the dev branch with:

This is it! the models are running on the same speed as the demos now. Thanks for the help!