[Question] Batch inference for vit

arseniymerkulov commented 10 months ago

It seems like all the tests in the repository related to processors and image models use one image per input.

Do the models support feeding a batch of images as input during inference? Is there a speed benefit from this?
Are there any other optimization/parallelization tools in transformers.js that I can use to process a set of images?

Used model: vit base (google/vit-base-patch16-224-in21k), tiny and small distillations (WinKawaks/vit-tiny-patch16-224), exported in onnx format with optimum

xenova commented 10 months ago

Hi there 👋

Yes you can feed a batch of images during inference. Here's an example:

import { pipeline } from '@xenova/transformers';

const classifier = await pipeline('image-classification', 'Xenova/vit-base-patch16-224')
const urls = [
    'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg',
    'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg',
];
const output = await classifier(urls)
// [
//   { label: 'tiger, Panthera tigris', score: 0.6074584722518921 },
//   { label: 'Egyptian cat', score: 0.8246098756790161 }
// ]

To measure performance improvements:

// Preload images to accurately measure inference time
const imgs = await Promise.all(urls.map(url => RawImage.fromURL(url)))

{
    const start = performance.now();
    await classifier(imgs);
    const end = performance.now();
    console.log('Running in parallel:', end - start); // 180.2979999780655
}
{
    const start = performance.now();
    for (const img of imgs) {
        await classifier(img);
    }
    const end = performance.now();
    console.log('Running sequentially:', end - start); // 194.33630001544952
}
{
    const start = performance.now();
    await Promise.all(imgs.map(img => classifier(img)));
    const end = performance.now();
    console.log('Running with Promise.all()', end - start); // 184.42629998922348
}

Since everything runs on the CPU at the moment, you most likely won't see dramatic speedups when running with batched inputs. But, I encourage you to do some testing to see what the best batch size is for you.

We rely on onnxruntime-web for running inputs in parallel, but there does seem to be a bug which prevents multiple threads from being used.

Fortunately, the webgpu backend is nearly ready, and that handles parallelization much better (since it runs on your GPU). And that would really help for vision tasks like image classification.

arseniymerkulov commented 10 months ago

thank you for your answer

xenova / transformers.js

[Question] Batch inference for vit #424