roboflow / supervision

We write your reusable computer vision tools. 💜
https://supervision.roboflow.com
MIT License
23.54k stars 1.76k forks source link

[InferenceSlicer] - allow batch size inference #781

Open inakierregueab opened 8 months ago

inakierregueab commented 8 months ago

Description

Currently, sv.InferenceSlicer processes each slice in a separate callback call - hindering inference with a batch size larger than 1. We can change this by:

Additional

SkalskiP commented 8 months ago

Hi, @inakierregueab 👋🏻 That is something we were considering but didn't implement due to time restrictions. Let me add some details to this issue. Maybe someone will pick it up.

Bhavay-2001 commented 8 months ago

Hi @SkalskiP, can I work on this issue if it is for beginners? Thanks

SkalskiP commented 8 months ago

Hi, @Bhavay-2001 👋🏻 Do you already have experience with running model inference at different batch sizes?

Bhavay-2001 commented 8 months ago

Hi @SkalskiP, yes I think I can manage that. Can you please let me know how to proceed with this? Thanks

SkalskiP commented 8 months ago

Great! Do you have any specific questions?

Bhavay-2001 commented 8 months ago

Hi @SkalskiP, how to add batch_size feature in the Inference Class. How can I test in google colab? Any start point that can help me get on track will be helpful.

SkalskiP commented 8 months ago

I outlined vital steps that need to be taken to add batch_size support in task description. I think you should just try to implement it, get first working version and submit PR so we could review it.

Bhavay-2001 commented 8 months ago

Hi @SkalskiP, can you please refer me some code sample that is already been implemented and provides the batch_size functionality?

SkalskiP commented 8 months ago

@Bhavay-2001, I'm afraid we do not have a code sample. Implementing batch inference was supposed to be executed in this task. :/

Bhavay-2001 commented 8 months ago

@SkalskiP, What I am thinking of doing is to implement a for loop with batch of images. Each image is then passed to the model and detections are collected and at the end the detections are returned for the batch.

Bhavay-2001 commented 8 months ago

Hi @SkalskiP, can you please review this PR?

Bhavay-2001 commented 7 months ago

Hi @SkalskiP, can you please review and let me know. Thanks

LinasKo commented 6 months ago

Me and SkalskiP had a conversation about this - I'll take over for now.

LinasKo commented 6 months ago

Intermediate results:

  1. I've confirmed that threads help, especially when the model is run on the CPU. I see a 5-10x performance improvement.
  2. I've implemented the batched inference slicer, allowing users to input both images and lists of images.
  3. Threading implementation is kept, docs written to point to either batch=N; threads=1 or batch=1; threads=N, depending on GPU / CPU needs.

Testing more broadly, however, provides mixed results.

  1. On my machine, batching provides a speed boost for ultralytics, does nothing for transformers (GPU) and inference (CPU, I believe).
  2. Using threads=8 slows down the ultralytics, batch=1 case, compared to threads=1. Only slower on my machine. In Colabs it's faster.

Still checking transformers - there's an obvious speedup with GPU, but I ran out of memory when trying with batching.

Colab coming soon.

LinasKo commented 6 months ago

https://colab.research.google.com/drive/1j85QErM74VCSLADoGliM296q4GFUdnGM?usp=sharing

As you can see, in these tests it only helped the Ultralytics case.

Known insufficiencies:

LinasKo commented 6 months ago

PR: #1108

LinasKo commented 6 months ago

@SkalskiP, Ready for review, details in #1108.

Geethen commented 3 months ago

This batched inference slicer does not write the detections to file. Also, a drawback of the initial inferenceslicer is it assumes that the entire image can be read into memory. This may not be the case when dealing with large satellite images. A solution to this is windowed reading and writing. the rasterio package offers windowed reading and writing rasterio windowed read and writes.

LinasKo commented 3 months ago

Hi @Geethen,

You brought up very good points. Indeed, the slicer is more suited for small objects rather than large images. When dealing with the latter, it will hog all available memory. I'll bring this up in our internal discussions.

As for saving the results, that's decoupled. Check out Sinks.

Geethen commented 3 months ago

Hi @Linas,

thanks for your speedy response. What I meant was that the proposed batched version assumes that each batch contains independent samples (seemingly the same with sinks). I was mentioning the case whereby you read a patch from a large image and then you write to the same window location in the output.

I look forward to the outcomes of your discussions:) I have struggled to find anything like this and have resorted to implementing my own version using the rasterio windows.

Thanks, Geethen

On Thu, Jun 20, 2024 at 3:20 PM Linas Kondrackis @.***> wrote:

Hi @Geethen https://github.com/Geethen,

You brought up very good points. Indeed, when dealing with very large images, this would hog all available memory. I'll loop this idea in, in our internal discussions.

As for saving the results, that's decoupled. Check out Sinks https://supervision.roboflow.com/develop/how_to/save_detections/.

— Reply to this email directly, view it on GitHub https://github.com/roboflow/supervision/issues/781#issuecomment-2180670724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBW5I6RETZKM56MDMP7X6LZILJK3AVCNFSM6AAAAABCLFQ4TKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGY3TANZSGQ . You are receiving this because you were mentioned.Message ID: @.***>

linas commented 3 months ago

Hi @Geethen You've got the wrong Linas. I'm Linas Vepštas. You want @LinasKo