Experiment with using lossless JPEG-XL (colorspace=YUV 400)

openclimatefix / Satip

Satip contains the code necessary for retrieving, transforming and storing EUMETSAT data

https://satip.readthedocs.io/

MIT License

41 stars 28 forks source link

Experiment with using lossless JPEG-XL (colorspace=YUV 400) #67

Closed JackKelly closed 2 years ago

JackKelly commented 2 years ago

Detailed Description

JPEG-XL is the "new kid on the block" for image compression. And it can losslessly compress greyscale images (using colorspace YUV 400). It might be great for compressing satellite images into our Zarr datasets.

Context

See issue #45 for previous experiments and notes.

This great comparison of lossless compression using JPEG-XL, WebP, AVIF, and PNG suggests JPEG-XL wins.

Possible Implementation

imagecodecs includes an interface to JPEG-XL.

Next step is to try manually compressing images using the cjxl app (not ImageMagick).

If that looks good then create a stand-alone little adapter library to adapt imagecodecs to be used with Zarr. Here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for JPEG-XL? UPDATE: @cgohike has already implemented this in imagecodecs! (See comment below)

To use ImageMagick for quick experiments at the commandline:

You need ImageMagick version > 7.0.10 to use JPEG-XL.

To install ImageMagick >7.0.10:

sudo apt-get install imagemagick libmagick++-dev

See here for how to install ImageMagick from source.

Then use 'magick' command not 'convert'.

I'll do some experiments later today or tomorrow :slightly_smiling_face:

TODO

[ ] investigate whether the JPEG-XL lossly uint8 images are fine as is. It looks great visually. And 4 timesteps of UK HRV is only 0.6 MB (compared to 2.9 MB with bzip2; and 2.2 MB with JPEG-XL lossless uint16).
[ ] investigate whether we can use different color profiles. See these docs.
[x] investigate if imagecodecs JPEG-XL can simply be installed through pip (or does it require libjxl to be manually installed first?) If it requires manual install then that maybe makes it inappropriate for a dataset that we might release publicly?
[x] to get to 8bits, divide by 4 AND ROUND
[ ] Try the ideas Jon suggested on the JPEG-XL github issue queue (especially putting all 12 channels into a single JPEG-XL, using lossless compression)
[x] See if it's possible to change the suffix of each Zarr chunk to .jxl (can't see how to do this and, anyway, Chrome cannot currently open jxl files)
[x] Check output is the same (or roughly the same) as the input: plot gamma curve; compute MSE; etc.
[ ] try using alpha channel for NaNs. (haven't tried this but not going to bother because we can just use float16)
[x] try float16 for saving NaNs. Yup, float16 saves NaNs, and there's no gamma curve. Need to map values to the range [0, 1].
[ ] try float32 with jpeg-xl
[ ] measure decompression speed of jpeg-xl vs gzip2
[ ] prepare a PR for Satip for using jpeg-xl

cgohlke commented 2 years ago

create a stand-alone little adapter library to adapt imagecodecs to be used with Zarr

Try https://github.com/cgohlke/imagecodecs/blob/ddddb7bb0e34aa243e1b45c04b545343c0569828/imagecodecs/numcodecs.py#L527-L574

JackKelly commented 2 years ago

Awesome, thanks so much for your help!

JackKelly commented 2 years ago

Some initial result compressing a single, full resolution, full-geo-extent satellite image (5576x3560, 8 bit per pixel grayscale) (HRV_MSG3-SEVI-MSG15-0100-NA-20190620105917_cropped.png) using cjxl --quality=100 using our data-processing VM on GCP (32-core EPYC with 128 GB of RAM) :

effort	runtime	filesize (MB)
1	0.7 sec	6.6
7	20 sec	4.9
9	2min 52sec	4.8
visually lossless, effort=9	2min 30sec	1.5
visually lossless, effort=5	0.7 sec	1.5

It's notable that cjxl spends the vast majority of its time using a single CPU core. Which maybe suggests it'll play nicely with Dask (whereby multiple Zarr chunks can be compressed in parallel).

"visually lossless" means --distance=1. I should check to see just how lossless "visually lossless" is.

Some other conversions done using imagemagick:

algo	filesize (MB)
WebP lossless	5.9
JPEG lossless	6.8
JPEG 2k lossless	5.7
WebP near_lossless=100	0.6
PNG	6.7
AVIF lossless	6.1
AVIF min=2 max=2	3.2
tiff.bz2	7.5

So, in conclusion, our hunch was correct: JPEG-XL can create the smallest lossless files, but it takes a while! But maybe that's OK if we Dask can compress in parallel. And maybe compute costs go up with, say, the square of the image size, so maybe its way faster to compress multiple small chunks? (I'm totally guessing here!)

And it looks like JPEG-XL can create a file size that's 64% of the size of a bz2 compressed file (7.5 MB for bz2 vs 4.8 MB for JXL)

TODO: Check if 'visually lossless' is good enough.

JackKelly commented 2 years ago

reading through the jpeg-xl code in imagecodecs, it looks like all we have to do is cast our np.int16 images to np.uint16 and pass the images in, one satellite channel at a time, and the code will figure out that it should use grayscale :slightly_smiling_face:

JackKelly commented 2 years ago

I'm probably doing something stupid but I'm failing to get imagecodecs.jpegxl_encode to work... I've submitted an issue: https://github.com/cgohlke/imagecodecs/issues/29

JackKelly commented 2 years ago

OK, some initial findings about using JPEG-XL with Zarr:

For jpegxl_encode to recognise the first dimension as a time dim, it's necessary to append a length 1 channel dim as the last dim; and convert to uint16 or uint8: e.g. satellite_dataset.example_dims(dim="channe", axis=-1).clip(min=0).astype(np.uint16)

To engage lossless mode, use JpegXL(distance=0, level=None, effort=<int between 1 and 9>). At the moment, numcodecs.jpegxl_encode enables lossless mode if and only if level is None or < 0 (which I think might be a bug: level actually sets the decoding speed. I've submitted a bug here).

Using lossless mode is a little fiddly. To engage lossless mode, level must not be None. When using uint16 values in the range [0, 1023], I think JPEG-XL assumes the image is really dark, and compresses the hell out of it, unless distance is tiny (like 0.01). distance can be 0 or 0.01, it can't be non-zero and less than 0.01. If using values in the range [0, 1023], and distance=0.01, then the output values are in the wrong range, although the image looks OK.

Expanding the range of values to the range [0, 2**16-1] means that distance can be more like 1. But it looks like it changes the gamma of the image, which is probably bad for our use-case. Here's the compressed image:

And the raw uncompressed image:

which I have a hunch is because this bit of code uses SRGB if the image is uint8 else uses LinearSRGB. Using uint8 compressed image looks a lot better:

Here are some results:

For four timesteps:

algo	runtime	size
bz2, level=5	6.8 sec	2.9 MB
jpex-xl, distance=0, effort=7	8.15 sec	2.2 MB
jpex-xl, distance=0, effort=9	29 sec	2.2 MB
jpex-xl, distance=1, effort=7	8.2 sec	2.2 MB
jpeg-xl, distance=0.5, effort=9, uint8	16.2 s	0.6 MB
jpeg-xl, distance=0.5, effort=7, uint8	6.8 s	0.6 MB
jpeg-xl, distance=0.0, effort=7, uint8	8.58 s	1.3 MB
bz2, level=5, uint8	5.64 sec	1.8 MB

bz2 with uint8 is very impressive, tbh. jpeg-xl uint8 is even better, of course :slightly_smiling_face:

If the full dataset is 25TB using bz2, then it'd be more like 5 TB if using jpeg-xl lossy uint8 :) or 15 TB using uint8 bz2. A very nice thing about 5 TB is it'll fit onto a consumer 8TB SSD :slightly_smiling_face:

For a public dataset, bz2 uint8 might be the way to go because it's easier to install.

JackKelly commented 2 years ago

imagecodecs JPEG-XL can simply be installed through pip (nice!) No need for libjxl to be installed manually as well. That's great news because it means that enabling JPEG-XL compression in Python is as simple pip install imagecodecs :slightly_smiling_face:

JackKelly commented 2 years ago

Some conclusions:

TL;DR: Let's use jpeg-xl, distance=0.6, effort=8, float16. Need to map values to the range [0, 1], and use JpegXlFuture wrapper.
In the numbers above, JPEG-2000 actually might look a tiny bit better than JPEG-XL! But the imagecodecs implementation of JPEG-2000 doesn't appear to support float16 or float32 (so no representation of NaNs).

Detail results:

For four timesteps:

algo	runtime	size	MAE	PSNR
bz2, level=5, int16	6.8 sec	2.9 MB
jpex-xl, distance=0, effort=7, uint16	8.15 sec	2.2 MB
jpex-xl, distance=0, effort=9, uint16	29 sec	2.2 MB	0
jpex-xl, distance=0, effort=9, float16	27.5 sec	2.2 MB	0	100
jpex-xl, distance=1, effort=7, uint16	8.2 sec	2.2 MB
jpeg-xl, distance=0.5, effort=9, uint8	16.2 s	0.6 MB	0.378	51.27
jpeg-xl, distance=0.5, effort=8, uint8	7.8 s	0.6 MB	0.379	51.26
jpeg-xl, distance=0.5, effort=7, uint8	6.8 s	0.6 MB	0.365	51.51
jpeg-xl, distance=0.0, effort=7, uint8	8.58 s	1.3 MB
jpeg-xl, distance=0.0, effort=9, uint8	18.2 s	1.3 MB
jpeg-xl, distance=1.0, effort=9, uint8	12.8 s	0.4 MB	0.593
jpeg-xl, distance=0.1, effort=9, uint8	35.6 s	1.8 MB
jpeg-xl, distance=0.2, effort=8, uint8	10.6 s	1.2 MB	0.136	56.73
jpeg-xl, distance=0.3, effort=8, uint8	7.8 s	0.86 MB	0.242	53.99
jpeg-xl, distance=0.4, effort=8, uint8	8.08 s	0.69 MB	0.317	52.45
jpeg-xl, distance=0.6, effort=8, float16	7.8 s	0.74 MB	0.376; or 0.353 after rounding	52.34
jpeg-xl, distance=0.5, effort=7, float16	5.43 s	0.87 MB	0.324	53.85
jpeg-xl, distance=0.4, effort=8, float16	7.8 s	1.1 MB	0.2749	55.17
jpeg-2k, level=100, uint8	4.9 s	1.1 MB	0.1625	56.01
jpeg-2k, level=80, uint8	5.2 s	1.1 MB	0.163	55.99
jpeg-2k, level=65, uint8	5.0 s	1.0 MB	0.178	55.55
jpeg-2k, level=60, uint8	5.0 s	0.9 MB	0.210	54.78
jpeg-2k, level=55, uint8	5.0 s	0.75 MB	0.2935	53.01
jpeg-2k, level=50, uint8	4.9 s	0.5 MB	0.468	49.79
bz2, level=5, uint8	5.64 sec	1.8 MB

Some plots of jpeg-xl, distance=0.6, effort=8, float16:

The compressed image:

The notebook for creating these plots is here

JackKelly commented 2 years ago

Some notes from today's experiments...

JPEG-XL compression for our pre-prepared batches

Our "pre-prepared batches" have five dimensions:

example: 32
time: 24
y: 24
x: 24
channels: 11

AFAICT, NetCDF cannot use the numcodecs API. And so we cannot use JPEG-XL (or any other imagecodecs) to compress data into NetCDF files. (Which is a shame because it might have been nice to use JPEG-XL to compress our pre-prepared batches.)

We could use Zarr for our pre-prepared batches. But the JpegXl class in imagecodecs only really works with images which are of shape (y, x). We can make this work like this:

class JpegXlFuture(JpegXl):
    """Simple hack to make the JpegXl compressor in the currently released
    version of imagecodecs (version 2011.11.20) look like the version in development.
    (We can't simply use the version in development because the imagecodecs author
    does not develop on GitHub. The imagecodecs authors just uses GitHub as one of
    several mechanisms to release imagecodecs.)

    See https://github.com/cgohlke/imagecodecs/issues/31#issuecomment-1026179413
    """

    codec_id = 'imagecodecs_jpegxl'

    def __init__(self, lossless=None, decodingspeed=None, level=None, distance=None, *args, **kwargs):
        """
        Args:
            distance: Lowest settings are 0.00 or 0.01.  If 0.0 then also set lossless to True.
            level: DON'T SET THIS WITH THIS JpegXlFuture wrapper! 
                In imagecodecs version 2011.11.20, level is mapped (incorrectly) to the decoding speed tier. 
                Minimum is 0 (highest quality), and maximum is 4 (lowest quality). Default is 0.
            decodingspeed: DON'T SET THIS WITH THIS JpegXlFuture wrapper!
        """
        assert decodingspeed is None
        if lossless is not None:
            if lossless:
                assert level is None  # level must be None to enable lossless in imagecodecs 2011.11.20.
                assert distance is None or distance == 0
            else:
                # Enable lossy compression.
                level = 0  # level must be set to 0, 1, 2, 3, or 4 to enable lossy compression in imagecodecs 2011.11.20.
        super().__init__(level=level, distance=distance, *args, **kwargs)

    def encode(self, buf):
        n_examples = buf.shape[0]
        n_timesteps = buf.shape[1]
        n_channels = buf.shape[-1]
        assert n_examples == 1
        assert n_timesteps == 1
        assert n_channels == 1
        buf = buf[0]
        return super().encode(buf)

    def decode(self, buf, out=None):
        assert out is None
        out = super().decode(buf, out)
        out = out[np.newaxis, ...]
        return out

register_codec(JpegXlFuture)

# Convert to uint8
satellite_dataset["data"] = (satellite_dataset["data"].clip(min=0, max=1023) / 4).round().astype(np.uint8)

lossless = False
distance = 0.6
effort = 8

encoding = {
    "data": {
        "compressor": JpegXlFuture(lossless=lossless, distance=distance, effort=effort),
    },
}

chunks = dict(example=1, time_index=1, y_geostationary_index=None, x_geostationary_index=None, channels_index=1)

satellite_dataset.chunk(chunks).to_zarr(
    DST_PATH / (filename.stem + ".zarr"),
    mode="w",
    encoding=encoding,
    compute=True,
)

But this takes a whole minute per batch, and you end up with lots of tiny Zarr chunks on disk (a few tens of bytes!)

The JPEG-XL spec allows for multiple images within a single JPEG-XL file. But I don't think imagecodecs.JpegXl knows how to decode multiple images. (Although I think it does know how to encode multiple images!) When encoding all timesteps into a single image, it "only" takes 20 seconds per batch. (I've asked a question on the imagecodecs issue queue here)

JackKelly commented 2 years ago

Using JPEG-XL for intermediate Zarrs

I'm much more optimistic about using JPEG-XL for our intermediate Zarrs.

(In our current Zarrs, the "real" pixel values are in the range [0, 1023], and NaNs are encoded as -1).

The slightly fiddly thing is how to encode "NaNs".

JPEG-XL supports uint8, uint16, float16, and float32.

Trying to compress a float16 or float32 image which includes any NaNs results in JPEG-XL silently failing to save any image data to disk (not surprisingly: NaNs aren't really a thing in "proper" photography, which is what JPEG-XL is mostly designed for).

When using float16 or float32, the image values are supposed to be in the range [0, 1]. It appears it is possible to compress images where some values are outside that range (e.g. -1). But the "illegal" values change quite dramatically. And I'm keen to not do anything that's out-of-spec, for fear that it breaks something.

Where I've landed is for "real" image values to be in the range [0.075, 1], and use 0.025 to encode NaNs. For testing, I injected a 100 x 100 pixel "NaN box" of 0.025 values in the bottom right of the image, like this:

original_float32["data"][:, 100:200, 100:200] = 0.025

This does result in a bit of "ringing" (for want of a better term) around the edges of the "NaN box":

All values in the "NaN box" are between 0.0227 to 0.0280.

(The ringing is much larger amplitude if we use 0.95 as the "NaN marker". Then the values of the "NaN box" range from about 0.90 to 0.99)

Here's the code I'm using:

original_float32 = original_float32.astype(np.float64)  # Use float64 for maths. Then turn to float32.

# Encode "NaN" as 0.025. Encode all other values in the range [0.075, 1].
# But, after JPEG-XL compression, there is slight "ringing" around the edges.
# So, after compression, use "image < 0.05" to find NaNs.
LOWER_BOUND_FOR_REAL_PIXELS = 0.075
NAN_VALUE = 0.025
original_float32 = original_float32.clip(min=0, max=1023) 
original_float32 /= 1023 * (1 + LOWER_BOUND_FOR_REAL_PIXELS)
original_float32 += LOWER_BOUND_FOR_REAL_PIXELS
original_float32 = original_float32.where(
    cond=original >= 0, 
    other=NAN_VALUE,
)
original_float32 = original_float32.astype(np.float32)

lossless = False
distance = 0.4
effort = 8

encoding = {
    "data": {
        "compressor": JpegXlFuture(lossless=lossless, distance=distance, effort=effort),
    },
}

# Need to set to a channel so that JpegXl interprets the first dimension as "time".
# original_float32 has chunks: `time=1, y=None, x=None`
zarr_store = original_float32.expand_dims(dim="channel", axis=-1).to_zarr(
    OUTPUT_ZARR_PATH,
    mode="w",
    encoding=encoding,
    compute=True,
)

Doing this, the PSNR is a very healthy 54.37. But beware that PSNR increases if you simply brighten the image (which is what we've done!)

The size on disk for my four test timesteps is a total of 690 kB, which is 24% of the size of the same image compressed using bz2 :slightly_smiling_face:

Summary

Let's use JPEG-XL, distance=0.4, effort=8, lossless=False. Encode NaNs as 0.025. Encode "real" pixel values in the range [0.075, 1]. The recover the original image with something like:

# Set all values below 0.05 as NaN:
recovered_image = decompressed_image.where(decompressed_image > 0.05)
recovered_image -= LOWER_BOUND_FOR_REAL_PIXELS
recovered_image *= 1023 * (1 + LOWER_BOUND_FOR_REAL_PIXELS)

Doing this gives a PSNR of 53.74.