pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.83k stars 7.06k forks source link

Speed up JPEG decoding by allowing resize during decode #8986

Open gyf304 opened 1 month ago

gyf304 commented 1 month ago

🚀 The feature

Torchvision's read_image currently decodes JPEG images at full resolution. However, both libjpeg and libjpeg-turbo support decoding at lower resolutions (1/2, 1/4, 1/8 of the original size).

Introducing a size_hint parameter would allow users to specify an approximate target size, with torchvision selecting the closest larger available scale factor and downscale the JPEG image during decoding.

Example Usage:

from torchvision.io.image import decode_image
tensor = decode_image("image.jpeg", size_hint=(224, 224))

Motivation, pitch

Alternatives

Additional context

Benchmark

We implemented a proof-of-concept and ran performance tests on decoding a 1920x1080 image into 960x540. We compared the following:

Benchmark results (1000 iters):

9.91s call     .../test_jpeg.py::test_torchvision_image_load_with_resize_960_540
4.00s call     .../test_jpeg.py::test_fastjpeg_image_load_with_size_hint_960_540

~2.5X speed up.

I'm happy to contribute a patch if people consider this useful.

NicolasHug commented 1 month ago

Thank you for the feature requet @gyf304 . I think that eventually since is something we'll want to enable.

The main challenge here isn't to implement the feature, it's to expose it in a way that isn't going to provide users with a massive footgun.

It is very important for the resizing algorithm (bilinear vs bicubic vs nearest neighbor + with or without antialiasing) to be consistent between training and inference time. When it's not, models accuracy regresses in ways that are very difficult to debug. This has caused a lot of confusion for users over time (e.g. back when the default of antialias parameter of torchvision's Resize wasn't consistent between PIL and Tensors).

So, if we're going to expose a resizing mechanism outsize of torchvision's Resize(), e.g. in decode_image(), we'll have to ensure that the new resizing implementation is consistent with what Resize() exposes, and we should make it hard for users to end up with inconsistent resizing parameters.

gyf304 commented 1 month ago

@NicolasHug I accidentally fat-fingered and clicked "Comment and Close Issue" - GitHub unfortunately does not allow me to reopen this issue.

I think this concern can be mitigated by:

  1. Understanding and documenting how resize during decode works
  2. Understanding and documenting its intended use
  3. Designing the API to minimize potential issues

1. Understanding How Resize During Decode Works

JPEG resize during decode is performed at the IDCT level, meaning it operates in the frequency domain. The process is somewhat comparable to applying a sinc filter*.

* This isn’t entirely accurate, as JPEG processes 8x8 blocks, whereas a true sinc filter is unbounded.
* A Lanczos filter, previously referred to as Antialias filter in Pillow, can be seen as a truncated approximation of a sinc filter.

2. Understanding Its Intended Use

Since JPEG resize during decode is limited to predefined scaling factors, the final output size may not precisely match the requested size_hint.

For example, calling decode_image("image.jpg", size_hint=(224, 224)) on a JPEG image guarantees a decoded image that is at least (224, 224), if possible. If an exact size is required, users should follow up with Resize((224, 224)).

It's not feasible to expect that:

resize(decode_image("image.jpg", size_hint=(224, 224)), (224, 224))

will always yield the same result as:

resize(decode_image("image.jpg"), (224, 224))

However, the difference should be minimal.

3. Designing the API to Prevent Issues

This feature, as proposed, is opt-in and does not modify how Resize() functions. Additionally, its docstring can include a clear warning about its implications to help users make informed decisions.

vadimkantorov commented 2 weeks ago

Maybe also worth allowing passing these predefined scale factors directly 1/2, 1/4, 1/8 instead of size_hint