Open tarekziade opened 8 months ago
Can you try using the unquantized version? Done by specifying:
const pipe = await pipeline('task', 'model', { quantized: false });
It's slightly faster,
Maybe it's the image encoding step, I will try to measure each step
I tried this:
const model_name = "Xenova/yolos-tiny";
let model = await AutoModelForObjectDetection.from_pretrained(model_name);
var start = Date.now();
const processor = await AutoProcessor.from_pretrained(model_name);
const image = await RawImage.read(imageElement.src);
const image_inputs = await processor(image);
var end = Date.now();
console.log(`Image processing Execution time: ${end - start} ms.`);
var start = Date.now();
const { image_embeds } = await model(image_inputs);
var end = Date.now();
console.log(`Inference Execution time: ${end - start} ms.`);
and that gives:
Image processing Execution time: 161 ms.
Inference Execution time: 14652 ms.
here's the full session recorded with Firefox's profiler
https://share.firefox.dev/497JdCi
the function that is slow is _OrtRun
in the onnx runtime. I don't think I can get more info unless I run it with symbols.
Can you specify the runtime somewhere in the config? I could point one with the debug symbols
This might just be a limitation of onnxruntime-web's WASM execution provider, and can be fixed with the new WebGPU execution provider (coming soon).
@fs-eire @guschmue might be able to do more in-depth profiling.
System Info
latest wasm version
Environment/Platform
Description
I am trying to run https://huggingface.co/hustvl/yolos-tiny using a quantized version (similar to Xenova/yolos-tiny) and it works by using the
object-detection
pipeline but it is extremely slow.An image that gets infered using the same model in transformers python takes around 15 seconds on my M1. The python version takes 190 ms.
I tried to run the web dev tool, and the curlpit is in the ONNX runtime at wasm-function[10863] @ ort-wasm-simd.wasm:0x801bfa but I don't have the debug symbols so it's kind of useless...
Is there a way to force transformers.js to run with a debug version of the ort runtime?
Reproduction
Runs the object detection demo at https://xenova.github.io/transformers.js/, swap the detr-resnet model with the yolo-tiny