libjxl / libjxl

JPEG XL image format reference implementation
BSD 3-Clause "New" or "Revised" License
2.57k stars 248 forks source link

Compress efficiently more than 3 color channels #1245

Open daniellandau opened 2 years ago

daniellandau commented 2 years ago

Is your feature request related to a problem? Please describe. I have 200 color channels of the same image (hyperspectral satellite imagery) and I'd like to compress it as efficiently as possible losslessly. When I encode exactly 3 channels, I get better compression than with a single channel, but I haven't found a way to ask libjxl to encode more channels at the same time.

Describe the solution you'd like Maybe what I want is already possible and I'm missing pointers to documentation to input the channels correctly, or define a new custom colorspace/colorencoding/something similar? Or maybe what I want simply isn't possible as the code in color_encoding_internal.cc seems to talk about primaries r/g/b? It seems to work very well for 3 channels even though the colors I tested with were nothing near red/green/blue in physical reality.

Describe alternatives you've considered I've tried adding more channels as new frames, or as extra channels in the same frame. Both of these seem to work (didn't try decoding), but provide the same compression as compressing each channel to a separate single channel file.

Additional context I'm currently testing with the exported C API, but if the solution is hacking the internals (or better yet, improving them for everybody) then that's fine for me too.

jonsneyers commented 2 years ago

Currently the encoder only applies RCTs (reversible color transforms) like YCoCg, as well as palette transforms, to the first 3 channels. The relevant encoder heuristics that decide what transforms to apply start here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L616 for the global (full image) transforms and here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L1293 for the local (per-group) transforms.

The jxl bitstream does allow applying RCTs on any 3 consecutive channels, and multiple RCTs can be signaled. There are also RCTs that do nothing but permute the channels, which can be used to reorder the channels in case you want to apply RCTs on non-consecutive channels. I would try to order the channels in a way that puts channels together that are likely to correlate well, and then extend the logic to try RCTs not just on the first 3 channels, but on the next groups of 3 channels too.

I don't know to what extent Palette helps for your use case. The current encoder only tries to do single-channel palette (useful if not all values in the range 0..2^n-1 are actually used), N-channel palette on all channels (e.g. RGBA), and (N-1)-channe palette on all channels except the last one (e.g. RGB). Palettes can be used globally (whole image) or per-group. It can be very effective if you have groups (regions of 256x256 pixels by default) that don't contain a lot of combinations of values across different channels. But I assume that 200-channel palettes and 199-channel palettes are not going to be useful, so it's probably better to only try it on smaller subsets of channels (they have to be consecutive), or not at all.

One thing you should certainly try is setting JXL_ENC_FRAME_SETTING_MODULAR_NB_PREV_CHANNELS to a higher value than the default 0. This controls how many of the previously-encoded channels are taken into account in the entropy coding context model. I think it will only have an effect at the higher effort settings though, so speed could be a problem.

jonsneyers commented 2 years ago

See also this twitter thread: https://twitter.com/alexjc/status/1504113686125887495?s=20&t=0OVpggVlLT0Mth5j0-RqKQ

jonsneyers commented 2 years ago

Something else: currently we have these types in the spec for extra channels:

enum class ExtraChannel : uint32_t {
  // First two enumerators (most common) are cheaper to encode
  kAlpha = JXL_CHANNEL_ALPHA,
  kDepth = JXL_CHANNEL_DEPTH,

  kSpotColor = JXL_CHANNEL_SPOT_COLOR,
  kSelectionMask = JXL_CHANNEL_SELECTION_MASK,
  kBlack = JXL_CHANNEL_BLACK,  // for CMYK
  kCFA = JXL_CHANNEL_CFA,      // Bayer channel
  kThermal = JXL_CHANNEL_THERMAL,
  kReserved0 = JXL_CHANNEL_RESERVED0,
  kReserved1 = JXL_CHANNEL_RESERVED1,
  kReserved2 = JXL_CHANNEL_RESERVED2,
  kReserved3 = JXL_CHANNEL_RESERVED3,
  kReserved4 = JXL_CHANNEL_RESERVED4,
  kReserved5 = JXL_CHANNEL_RESERVED5,
  kReserved6 = JXL_CHANNEL_RESERVED6,
  kReserved7 = JXL_CHANNEL_RESERVED7,
  // disambiguated via name string, raise warning if unsupported
  kUnknown = JXL_CHANNEL_UNKNOWN,
  // like kUnknown but can silently be ignored
  kOptional = JXL_CHANNEL_OPTIONAL
};

We are currently revising the spec for a 2nd edition. Would you like to add one or more types here for your use case? You can always do kOptional + channel name, but still...

daniellandau commented 2 years ago

Thanks for the pointers, I'll research them!

As for the question about extra channels: I don't know about adding types there. Hyperspectral imaging from satellites might use maybe extra channels for cloud cover or something, but what I'm currently after is just compressing the actual data which is just a bunch of different colors in visible, infra-red, or ultraviolet so semantically their place would be in the primary channels if I understand the system correctly.

jonsneyers commented 2 years ago

There are only 1 (grayscale) or 3 (RGB) primary channels, all the rest has to go in extra channels. So semantically it would make sense to put the visible colors in the primary RGB channels, and all other colors in extra channels. I don't know enough about this use case, but perhaps something like kHyperSpectral as a type, and then some kind of naming convention in the (freeform UTF-8) channel name to denote the actual frequency band that channel corresponds to?

What is the current practice for storing this type of data? As far as I know, only TIFF and JPEG 2000 can handle that many channels, is that what is being used? Plus probably some kind of metadata to indicate the meaning of the channels?

daniellandau commented 2 years ago

Reading the code and testing stuff out lead me sort of to that understanding, but Wikipedia has

Up to 4100 channels (i.e. grayscale or RGB), optional alpha, and up to 4096 "extra" channels

Which confused me.

Some options used are GeoTIFF, SAFE (a directory with XML and JPEG 2000 files per channel) and hdf5.

daniellandau commented 2 years ago

At the moment I'm looking for a compression for getting the data from orbit to ground, but JPEG XL is a good candidate for archival too.

jonsneyers commented 2 years ago

Wikipedia was wrong, I just fixed it. Thanks for the find!

So we of course don't aim to fully replace existing approaches — detailed metadata for all use cases is not in the scope of JPEG XL — but we do already have an option to store uncompressed or compressed XML in the jxl file format (the corresponding isobmf boxes are xml and brobxml), which is currently used for XMP but it could just as well contain the SAFE metadata. Also GeoTIFF as far as I understand is basically a set of TIFF tags so they could be considered a special case of Exif metadata and written in an Exif or brobExif box.

Obviously I don't expect that all JPEG XL viewers will understand GeoTIFF/SAFE metadata and visualize things in a suitable way (that would be way too much of a rabbit hole to go into), but if some kind of "preview" image (say a visible light image, or some reasonable visualization of other bands) ends up in the RGB channels, and all the other data goes into extra channels of a specific type, say kGeo or kSatellite, then at least generic viewers / info tools can show the preview and indicate to the user that the file contains more information that has something to do with satellite imagery but they will need more specialized software to work with that.

We have a similar situation for some of the other extra channel types, where basically they get ignored by generic decoders, but you are able to know what the channel represents and could in principle make a decoder to visualize it. We do specify some basic semantic interpretation info in the jxl spec itself for the named extra channels, e.g. kThermal has values that are to be interpreted in Kelvin, kSelectionMask has the semantics that 0 means not selected and 1 means selected, etc. For spot colors, the color to render these channels in is part of the image header (it can basically be any RGBA color), and the decoder will render those channels. Besides alpha and spot colors, the other extra channels are currently not visualized by any viewer, but that could change in the future (e.g. one could make a viewer that allows seeing the kThermal channel as an overlay that can be optionally displayed, using a color gradient or something).

So I wonder if we should do this for satellite imagery too, and if so, if we should add any 'key' metadata to the ExtraChannelInfo header.

SENTINEL-2 data are acquired on 13 spectral bands in the VNIR and SWIR:

  • four bands at 10 m: 490 nm (B2), 560 nm (B3), 665 nm (B4), 842 nm (B8)
  • six bands at 20 m: 705 nm (B5), 740 nm (B6), 783 nm (B7), 865 nm (B8a), 1 610 nm(B11), 2 190 nm (B12)
  • three bands at 60 m: 443 nm (B1), 945 nm (B9) and 1 375 nm (B10)

This makes me think that perhaps the wavelength in nm could be a useful field for extra channels of type kSatellite (or kSpectral or kBand whatever a good name for this channel type would be). We could express it as a float16 header field so it only adds 2 bytes to the header while it can express a large range from X-rays to radio waves that way (so perhaps medical imagery could also use this channel type then).

One issue I see here is that the resolutions of these channels are not the same; jxl does have the concept of extra channels being at a lower resolution than the main image, but we only allow power-of-two proportions. If I understand correctly, here you have 4 channels at the finest resolution, then 6 channels at 1:2 resolution (each pixel covering 2x2 pixels of the first 4 channels), and then 3 channels at 1:6 resolution (covering 6x6 pixels of the finest resolution, or 3x3 pixels at 1:2 resolution). The 1:6 is somewhat annoying; in jxl this would have to be represented at 1:2 resolution where 3x3 blocks just happen to have the same value (or alternatively, resampled to 1:4 resolution where 2 original pixels are represented as 3 encoded pixels.

Obviously this wouldn't be enough information to fully specify how to interpret the data (for that you'd need to look at the XML/GeoTIFF metadata), but at least it's a lot more precise than just "kOptional" (i.e. "some channel, no idea what it is though").

What do you think? Am I making some kind of sense? As I said, I know nothing at all about this specific use case, I'm just trying to see if and how we should add a new extra channel type to the spec.

daniellandau commented 2 years ago

Thanks for fixing the text on wikipediia!

Some hyperspectral/multispectral images do contain RGB in visible light spectrum so using them for a preview is definitely doable. For example SENTINEL-2 files do contain a "TCI" which stands for True Color Image. Not all spectral images are in the visible light spectrum so there the preview stored in the primary channels would then be perhaps the single channel with most detail or a false color image chosen arbitrarily.

I happen to work with satellite data, but hyperspectral cameras are used on the ground too and then the extra channels wouldn't carry any special geo metadata. So if you want to name the extra channels, then I guess "channel" would be a suitable name, except all extra channels are channels already, so maybe "band" then.

The different spatial resolutions is a thing on SENTINEL-2 yes, and in other instruments too, but not all. I don't think it makes sense to store the different resolution bands on a single image file. That's not what they actually do either: they have one band per one jpeg 2000 file.

The absolute killer feature would be to compress more than 3 colors efficiently. I think what you're writing makes sense, but I don't think you want to start codifying in the spec anything about metadata based on my feedback alone. Compressing hyperspectral data is a topic of active research and there are many many papers with algorithms and no code. For now I'm probably just going to do bundles of 3 bands per jxl file to get the best compression ratio available now, but if we could get that ratio for N bands, or probably even better as your suggested JXL_ENC_FRAME_SETTING_MODULAR_NB_PREV_CHANNELS did have an effect too I think people would come and then we could hash out the metadata questions.

jonsneyers commented 2 years ago

In terms of reversible color transforms, we don't have direct transforms for N channels, but we can chain multiple transforms in arbitrary ways to construct "composite" color transforms that cover more colors. For example, say you start with 5 channels: A, B, C, D, E

Then you can apply a "subtract green from red and blue" transform on the first three channels to get: A-B, B, C-B, D, E

Then you could e.g. apply a "subtract avg(red,blue) from green" transforms on the middle three channels to get: A-B, B, C-B-(B+D)/2, D, E

and so on. This kind of chaining of decorrelating transforms could help for compression (but of course it can also just pollute channels with noise from another band). The search space is large though (due to combinatorial explosion), so I think probably the best way to proceed would be to manually try some chains of RCTs (based on domain knowledge/experience) to see if you can find one that outperforms the "3 channels at a time" approach.

RubenKelevra commented 2 years ago

Reading the code and testing stuff out lead me sort of to that understanding, but Wikipedia has

Up to 4100 channels (i.e. grayscale or RGB), optional alpha, and up to 4096 "extra" channels

Which confused me.

Some options used are GeoTIFF, SAFE (a directory with XML and JPEG 2000 files per channel) and hdf5.

Agreed that was confusing. I changed it in the meantime (as I searched for info about this as well):

https://en.wikipedia.org/w/index.php?title=JPEG_XL&diff=1090865122&oldid=1090578546&variant=en

RubenKelevra commented 2 years ago

@jonsneyers what about adding the ability to specify the frequency of a channel and the filter width/sensitivity width?

RGB is nothing more than 3 broad range image sensors, so additional color channels could just be specified with their frequency and width.

This would allow the encoder to have a look at neighbouring channels and decide if its worth doing diffs to them (for example).

The channels which resemble RGB the closest could automatically be decoded as a visible image of some sort as preview.

Not sure how to properly lay this out in the bitstream, but the RGB channels could just point to a different extra channel or the channels closest to the center of R and G and B would be put into the regular RGB channels.

If all channels are out of the range of R, G and B, the user would need to specify a primary channel (or just the first one is selected and put ad a grey channel into the image.

zougloub commented 6 months ago

There is hyperspectral, and then there is multimodal, where lossy compression would need to understand what a "distance" means (and all the psychovisual / subjective metrics may be inappropriate), unless some kind of normalization happens externally and is available via out-of-band metadata. In order to avoid opening a can of worms, maybe consider the restricted case about compressing a number of arbitrary channels, using an simple distance metric.

I've been using zfp-based things for similar data, and perversions like using a video codec when it's possible to reduce the pixel bit depth, and with verification of the resulting loss ; here's an example with https://github.com/gistairc/HS-SOD/ image 0078 (81 planes):

https://github.com/libjxl/libjxl/assets/998040/8dcb4be2-2e1d-4f86-b6b3-b57b0c44ac59

matrix-small