Open msaroufim opened 2 years ago
Thanks Mark for opening the issue.
From my very brief review of FFCV, there are a few main components:
I have no idea about how the underlying implementation works.
Some of these components might be in scope for torchvision, but typically a custom DataLoader would not. This might require involving other teams like that in charge of DataLoaderV2. So it's not completely clear to me yet what would be relevant and what would not, but this is something we could look into indeed.
Agreed I think I'll tackle investigate them in this order
When the news about FFCV first broke, I did a quick benchmark against our new datasets. TL;DR: Their custom dataset format seems to be simply saving the original data in a decoded state. This way you don't have to do the decoding at runtime and get a massive performance boost. Of course this comes with the cost of massively increased needed storage. In the benchmark I achieved a similar ~7x speed-up as FFCV reports with a ~5x increase of needed storage.
🚀 The feature
Integrate https://github.com/libffcv/ffcv for accelerated image decoding, preprocessing and loading
Motivation, pitch
I maintain torchserve, we've recently had customers complain about slow image preprocessing decoding https://github.com/pytorch/serve/issues/1546 - the performance implications are large. It's possible for me to solve them locally in torchserve but solving them a level higher in torchvision means anyone can benefit from the improvements
Summarizing discussion with @NicolasHug
Alternatives
Some alternatives exist like DALI There's also the do nothing alternative where we just provide a tutorial in ffcv instead of having a tight integration
Additional context
If this is a reasonable first issue to torch/vision I can pick this up