Decode to CHW format already

google / wuffs

Wrangling Untrusted File Formats Safely

Other

4.06k stars 129 forks source link

Decode to CHW format already #64

Closed SofaScience closed 2 years ago

SofaScience commented 2 years ago

For now, output from wuffs_aux::DecodeImage (in case of BGR pngs) is HWC array. Is there some way to make it CHW, so separated channels in memory? This format becomes more and more popular because of neural networks and cost of HWC -> CWH conversion is very high.

pjanx commented 2 years ago

Your terminology is extremely confusing and alien, could you explain in simple terms?

Do you mean planar instead of interleaved?

SofaScience commented 2 years ago

@pjanx I am sorry for that. You are right, I mean separated planes. That's very important in many Deep Learning applications now (for example PyTorch or TensorRT use this format). However, conversation interleaved -> planar is not cheap (for example, with cv::split).

pjanx commented 2 years ago

See usages of wuffs_base__pixel_format__is_planar() and wuffs_base__pixel_buffer__plane()--no, it's not currently supported.

Your best option in the foreseeable future is to do some custom post-processing, which will also give you flexibility with the bit depth, and with the choice of channels. Understand that image formats are wild--the input may be indexed (GIF, PNG), RGB/A interleaved (PNG), or even YCbCr planar with differently sized planes (lossy formats, 4:2:2 and 4:2:0). That's a lot of paths to care about. There are bigger concerns in the image library right now.

nigeltao commented 2 years ago

Yeah, custom post-processing is the way to go for now.

I'm surprised that HWC <-> CWH conversion isn't cheap (if HWC and CWH mean what I think it means; I've never seen those terms before today). I'm not very familiar with PyTorch or TensorRT, but it should be very SIMD friendly.

How do existing PyTorch or TensorRT programs load (and convert) PNGs? Do they use libpng? I don't see any PNG_TRANSFORM_FOOBAR definitions in libpng to split into separate R, G, B, A planes, so I'm guessing it'd have to be a post-processing step. Can you just do the same post-processing step?

SofaScience commented 2 years ago

I'm not sure what they use, but OpenCV has optimized enough split function for cpu and cuda. However, copy is still copy. It can be avoided in theory and on big images difference isn't small. But thanks for answers

pjanx commented 2 years ago

For what it's worth, it's a valid request, but unlikely to be fulfilled, unless someone would step in to work on it themselves.

Another thought: if you don't control the image source, you may want to colour manage the images, which means extra processing anyway. And if you do, you may want to pre-normalize it to whatever format is seen fit.