locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

Add COG Reading and Writing Support #681

Open jbouffard opened 5 years ago

jbouffard commented 5 years ago

Cloud Optimized GeoTiffs (COGs) are GeoTiffs formatted in such a way that they are optimized to be stored and accessed on the Cloud. Having the ability to read and write TiledRasterLayers as COGs would be very beneficial. The logic already exists in GeoTrellis, so getting this functionality into GPS should just be the process of creating wrappers around the given Readers and Writers.

jpolchlo commented 5 years ago

Not a solution, but a stop-gap:

def write_cog(layout, location, data, filename_template):
    """
    Write a GPS tile out as a Cloud-Optimized GeoTIFF

    Uses GDAL to write a COG out to disk, utilizing the given layout and location
    to generate the GeoTIFF header.

    Args:
        layout (``gps.LayoutDefinition``): The layout for the source layer
        location (``gps.SpatialKey`` or ``gps.SpaceTimeKey``): The cell identifier
            in the layer corresponding to the image data
        data (``gps.Tile``): The image data
        filename_template (str): A pattern giving the destination for the target
            image.  Contains two '{}' tokens which will be ``.format``ed with the
            column and row of the ``location``, in that order.  May be an S3 uri or
            local path.
    """
    bands, w, h = data.cells.shape
    nodata = data.no_data_value
    dtype = data.cells.dtype
    cw, ch = cell_size(layout)
    ex = extent_for_cell(layout, location)
    overview_level = int(log(w) / log(2) - 8)

    with rstr.io.MemoryFile() as memfile:
        with memfile.open(driver='GTiff',
                          count=bands,
                          width=w,
                          height=h,
                          transform=Affine(cw,  0.0, ex.xmin,
                                           0.0, -ch, ex.ymax),
                          crs=rstr.crs.CRS.from_epsg(4326),
                          nodata=nodata,
                          dtype=dtype,
                          compress='lzw',
                          tiled=True) as mem:
            windows = list(mem.block_windows(1))
            for _, w in windows:
                segment = data.cells[:, w.row_off:(w.row_off + w.height), w.col_off:(w.col_off + w.width)]
                mem.write(segment, window=w)
                mask_value = np.all(segment != nodata, axis=0).astype(np.uint8) * 255
                mem.write_mask(mask_value, window=w)

            overviews = [2 ** j for j in range(1, overview_level + 1)]
            mem.build_overviews(overviews, rwarp.Resampling.nearest)
            mem.update_tags(ns='rio_oveview', resampling=rwarp.Resampling.nearest.value)

            uri = urlparse.urlparse(filename_template)
            if uri.scheme == 's3':
                boto3.resource('s3').Bucket(uri.netloc).upload_fileobj(memfile, uri.path.format(location.col, location.row)[1:])
            else:
                rshutil.copy(mem, filename_template.format(location.col, location.row), copy_src_overviews=True)

Use tile_to_layout() to build a coarse tiling (say, 2048 or 4096), then use to_numpy_rdd() and then foreach to write the individual COGs.