pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.01k stars 6.93k forks source link

ffcv integration #5954

Open msaroufim opened 2 years ago

msaroufim commented 2 years ago

🚀 The feature

Integrate https://github.com/libffcv/ffcv for accelerated image decoding, preprocessing and loading

Motivation, pitch

I maintain torchserve, we've recently had customers complain about slow image preprocessing decoding https://github.com/pytorch/serve/issues/1546 - the performance implications are large. It's possible for me to solve them locally in torchserve but solving them a level higher in torchvision means anyone can benefit from the improvements

Summarizing discussion with @NicolasHug

Alternatives

Some alternatives exist like DALI There's also the do nothing alternative where we just provide a tutorial in ffcv instead of having a tight integration

Additional context

If this is a reasonable first issue to torch/vision I can pick this up

NicolasHug commented 2 years ago

Thanks Mark for opening the issue.

From my very brief review of FFCV, there are a few main components:

I have no idea about how the underlying implementation works.

Some of these components might be in scope for torchvision, but typically a custom DataLoader would not. This might require involving other teams like that in charge of DataLoaderV2. So it's not completely clear to me yet what would be relevant and what would not, but this is something we could look into indeed.

msaroufim commented 2 years ago

Agreed I think I'll tackle investigate them in this order

  1. Custom decoder
  2. Custom transform
  3. Maybe custom dataset format (needs more investigation)
  4. Custom dataloader - probably not unless we implement it as a reading service in torch.data
pmeier commented 2 years ago

When the news about FFCV first broke, I did a quick benchmark against our new datasets. TL;DR: Their custom dataset format seems to be simply saving the original data in a decoded state. This way you don't have to do the decoding at runtime and get a massive performance boost. Of course this comes with the cost of massively increased needed storage. In the benchmark I achieved a similar ~7x speed-up as FFCV reports with a ~5x increase of needed storage.