hotosm / oam-dynamic-tiler

Dynamic tiling of raster data for OpenAerialMap + others
Other
35 stars 12 forks source link

JPEG Compression, Masking, and Mosaics #10

Closed mojodna closed 7 years ago

mojodna commented 7 years ago

JPEG Compression + Masking

While attempting to process some large images as part of a UAV-produced scene in Haiti, gdaladdo was giving me trouble producing external overviews (per the previous workflow). Specifically, it was getting ~70% through and subsequently appearing to stop proceeding (with high memory use and 100% CPU). I was also seeing variable behavior between GDAL-1.x and 2.x. The combination of these things caused me to re-read the internet and re-evaluate how we can/should store GeoTIFF sources.

Paul Ramsey's "GeoTIFF Compression for Dummies" and Even Roualt's "Advanced JPEG-in-TIFF Uses in GDAL" led me to the conclusion that I should revisit my investigation of JPEG compression (which I'd bailed on due to alpha transparency problems when using overviews and success w/ DEFLATE + DEM data; more on the transparency problems shortly) and look into using the YCbCr color space to see what sort of space saving we could achieve.

Switching from DEFLATE-compressed 4-band (RGBA) tiled GeoTIFFs to JPEG-compressed (PHOTOMETRIC=YCrCb) reduced source file size approximate 10x (which had already been reduced over uncompressed uploads).

The reduced file size meant that we now had enough overhead to be able to include overviews (per Paul Ramsey's desire/recommendation) in the files we distribute while keeping them a fraction of the previous size. The alpha transparency problems resurfaced, however, resulting in the need for some black magic (and external masks).

What was 2.9GB (DEFLATEd) is now 372MB (including overviews) + 5.5MB (external mask w/ overviews).

Ok, so the alpha transparency problems... JPEGs notionally support only 3 bands (RGB) (which as a hard limit when using YCbCr). This means that anywhere we'd normally use an alpha channel we'd end up with (0,0,0) (black). Not ideal when outputs are intended to be composited over one another.

image

The typical approach appears to be to extract the alpha band (-b 4 in gdal_translate parlance) into a mask band (-mask 4) which is stored (as <file>.tif.msk by default) and treated separately (because its use is explicitly masking, its bit depth can be truncated and alternate (DEFLATE, I think) compression is used on it).

When working with such TIFFs (produced via bin/transcode.sh using gdal_translate -b 1 -b 2 -b 3 -mask 4 -co TILED=yes -co COMPRESS=JPEG -co PHOTOMETRIC=YCbCr -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 $input $output) at their native resolution (no reprojection or overviews in play), all is well.

Well, mostly. I've been using QGIS to spot check things and it doesn't yet support masks, so a workaround was needed. A VRT can be created that treats the mask as an alpha channel (leaving it in its separate file + with specialized treatment): gdal_translate -b 1 -b 2 -b 3 -b mask -of VRT $output.tif $vrt. QGIS handles this just fine and displays areas that should be masked as transparent.

Great. These VRTs now get produced alongside the source TIFF and its .msk sidecar (which is a standard thing), and we can make them available for download to help people out.

When overviews are added to the RGB source TIFF (using Lanczos resampling), gdaladdo doesn't add them to the .msk (which is a lot smaller, but when working with lots of pixels, their presence simplifies things a bunch). QGIS (et al) are fine with this, producing transparent areas where expected.

However, it gets tricky again.

One of the fundamental things that makes this dynamic tiling mechanism possible is GDAL's ability to create "warped VRTs". These are amazing, and when produced with consistent dimensions (leading to mass quantities of NODATA present, but it's all virtual), "pixel" windows (corresponding to tile requests matching the zoom level used for the VRTs dimensions, scaled ) can be read out of the VRT, reprojecting (and resampling) the source on the fly, using overviews when possible (via decimated reads).

I had to massage the warped VRT by hand to add MaskBand entries in order for it to recognize the masks present in the source imagery (gdalwarp currently only generates alpha bands). However, the edges of the imagery weren't cleanly masked, with artifacts showing up at all resolutions.

image

(This is from a pair of images being mosaicked, but it demonstrates the problem.)

I don't know the root cause of this (and soldiered on boldly), but what I suspect is happening is: GDAL uses the same resampling method (Lanczos, in this case) for the RGB and mask bands. This is fine for the RGB channels, but produces interpolated values (between 0 and 255, the only values present in the source mask) which result in partially-transparent areas, or "artifacts."

Rather than continuing to beat my head against this (I'd already come full circle) or retreat back to DEFLATEd TIFFs (which I was having trouble generating overviews for), I took on the masking responsibility myself.

Rather than creating a single warped VRT with a mask band, the processing workflow now produces a pair of warped VRTs: 1 for the RGB channels, resampled using Lanczos, and one for the mask band, resampled using NearestNeighbour (which prevents interpolated values).

Within the Python tiler, windows are read from both VRTs and concatenated (RGB → RGBA) using numpy:

def read_window(window, src_url, mask_url=None, scale=1):
    tile_size = 256 * scale

    with rasterio.Env():
        src = get_source(src_url)
        data = src.read(out_shape=(3, tile_size, tile_size), window=window)

        if mask_url:
            mask = get_source(mask_url)
            mask_data = mask.read(out_shape=(1, tile_size, tile_size), window=window)
        else:
            mask_data = np.full((1, tile_size, tile_size), np.iinfo(src.profile['dtype']).max, src.profile['dtype'])

    return np.concatenate((data, mask_data))

Voilà, clean edges:

image

Mosaics

With transparency sorted, it was time to figure out mosaicking. I was initially hopeful that I could create a VRT of VRTs and let GDAL do the heavy lifting. However, that led back to the problems with alpha transparency mentioned above.

I considered using a Map Stack-like approach, initially using WMTS layers and delegating to GDAL. However, the WMTS layers reported correct extents for their underlying sources, meaning that windows weren't consistent across them.

Using the actual Map Stack approach (which fetches images from remote sources + composites them locally) was an option, but would have resulted in many more Lambda function invocations (and/or recursive HTTP requests), especially for areas that didn't actually contain data (for varying reasons).

Instead, I settled on an approach where I would filter the sources available for a scene according to whether they intersect the boundary of the tile being requested and read windows from each of those, alpha compositing them together (using the masks).

This worked great, but was fairly slow, especially for the first request (when all of the sources needed to be initialized, loading the TIFF data from S3). Being able to initialize sources (really: read windows) in parallel seemed like the ideal solution. multiprocessing was out, as it would prevent sources from being cacheable in the main process.

Thread pooling (via multiprocessing.dummy) initially seemed promising until I tried it: sources were still being read sequentially. Further investigation led me to realize that this was a result of Python's Global Interpreter Lock (GIL).

Poking through rasterio's Cython code surfaced an opportunity to release the GIL while reading sources, so I made that change and bundled a patched source build.

Source and mask pairings are still read sequentially (optimization opportunity!), but windows are now read in parallel (using the thread pool) and subsequently composited using numpy:

sources = filter(intersects(tile), meta['meta'].get('sources', []))

data = np.zeros(shape=(4, 256 * scale, 256 * scale)).astype(np.uint8)

# read windows in parallel and alpha composite
for d in pool.map(partial(read_masked_window, tile=tile, scale=scale), sources):
    mask = d[3] > 0
    mask = mask[np.newaxis,:]
    data = np.where(mask, d, data)

return data

image

This piecemeal approach to combining sources (particularly filtering sources that cover the current tile) seems like it would lend itself to curated mosaics, possibly even including a global OpenAerialMap one, especially if the OAM API is extended to produce tile-level (at a reasonably low zoom) lists of relevant sources, sorted by resolution / time.

Transcoding

The Docker image is now built with GDAL-2.2-dev in order to benefit from improvements when using JPEGs in cloud-optimized GeoTIFFs. Its ENTRYPOINT has also been removed so that it can be used not only as a local tile server but as a pre-built image to facilitate transcoding images (using AWS Batch or otherwise).

/cc @cgiovando @smit1678 @lossyrob @rouault