google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.16k stars 131 forks source link

A question regarding auxiliary C++ API #131

Closed dev0x13 closed 1 year ago

dev0x13 commented 1 year ago

What is the reason for this limitation on allowed pixel formats in wuffx_aux::DecodeImage? Thanks!

nigeltao commented 1 year ago

Because the pixel swizzler (converting from one pixel format to another) only supports a limited number of destination formats.

https://github.com/google/wuffs/blob/49f50cca91c458783acdd5617ba3012eb2237a2c/internal/cgen/base/pixconv-submodule-regular.c#L5153-L5225 is the relevant code. Note that the switch statement looks at the source pixel format and you have to drill into the functions (e.g. https://github.com/google/wuffs/blob/49f50cca91c458783acdd5617ba3012eb2237a2c/internal/cgen/base/pixconv-submodule-regular.c#L4694-L4771) to see supported destination pixel formats.

For N pixel formats, there are N*N different (source, destination) combinations. It would be a lot of code (not just to write, but there's also a binary-size cost) to support all of the combinations. Instead, we support a smaller subset: supported source pixel formats are based on what's 'natural' for the image file formats (JPEG, PNG, etc) and supported destination pixel formats are based on what's commonly used.

Is there a particular destination format that you'd like added, or are you just curious?

dev0x13 commented 1 year ago

Thank you very much for such a detailed response! I was initially interested in WUFFS_BASE__PIXEL_FORMAT__RGB destination pixel format, which appeared to be working fine for each input image from my PNG dataset, so I was curious why the auxiliary API doesn't cover this format too despite it's being quite 'natural' (for historical reasons) when it comes to computer vision models, especially neural networks. But now I see that WUFFS_BASE__PIXEL_FORMAT__RGB has somewhat limited support (lots of TODOs across the swizzler dispatching code), so it's actually might be safer to disable it at a higher level. However, it seems to me like for wuffs_aux::DecodeImage code it might make sense to just pass through the destination pixel format parameter and let the actual decoder itself fail later while selecting a pixel swizzler. Or am I getting this wrong? Anyway, thanks again and I'm closing the issue since the initial question was answered.

nigeltao commented 1 year ago

I was initially interested in WUFFS_BASE__PIXEL_FORMAT__RGB destination pixel format, which appeared to be working fine for each input image from my PNG dataset, so I was curious why the auxiliary API doesn't cover this format too

WUFFS_BASE__PIXEL_FORMAT__RGB works for most opaque (no alpha channel) PNG images because that's "the image file's natural pixel format" from https://github.com/google/wuffs/blob/4dfd70928b44d7828b334c06279cf9acfe5850d7/internal/cgen/auxiliary/image.hh#L129-L131

In terms of the bytes in the file format, PNG uses RGB or RGBA byte order (and big-endian encoding) even when e.g. Windows/x86 typically uses BGR or BGRA and little-endian.

As of a few days ago, decoding WUFFS_BASE__PIXEL_FORMAT__RGB didn't work for all PNGs, because of the "lots of TODOs across the swizzler dispatching code" that you noticed. Those TODOs were fixed since then, by https://github.com/google/wuffs/commit/8954281a4e6a070b9204ac54bc8aa906e69d5300

It wasn't zero new code, but it was a relatively small amount of new code, since we already supported WUFFS_BASE__PIXEL_FORMAT__BGR and WUFFS_BASE__PIXEL_FORMAT__RGBA_{NONPREMUL,PREMUL}.


despite it's being quite 'natural' (for historical reasons) when it comes to computer vision models, especially neural networks

You probably know more about image decoding in neural networks (e.g OpenCV) than I do. Do you know if they would work with a BGR pixel format or do they only speak RGB?


it seems to me like for wuffs_aux::DecodeImage code it might make sense to just pass through the destination pixel format parameter and let the actual decoder itself fail later while selecting a pixel swizzler. Or am I getting this wrong?

That might make a (small) difference, in theory, but in practice, the decoders just delegate to the pixel swizzler code and it seemed better to fail earlier. It might be worth revisiting that in the future, though. Thanks for the feedback.

dev0x13 commented 1 year ago

Thank you very much for the explanation and for efforts for adding RGB support!

You probably know more about image decoding in neural networks (e.g OpenCV) than I do. Do you know if they would work with a BGR pixel format or do they only speak RGB?

Generally speaking, it depends. Some of the solutions work with RGB and some work with BGR. I'd say they both are equally popular as there is no common convention for channel ordering, so support for both formats is very useful.

That might make a (small) difference, in theory, but in practice, the decoders just delegate to the pixel swizzler code and it seemed better to fail earlier. It might be worth revisiting that in the future, though. Thanks for the feedback.

That makes sense, thank you for explaining this!