thinkingmachines / geowrangler

🌏 A python package for wrangling geospatial datasets
https://geowrangler.thinkingmachin.es/
MIT License
48 stars 15 forks source link

Crop raster window and save to file #167

Closed alronlam closed 2 years ago

alronlam commented 2 years ago

This is from @tm-jc-nacpil :

Input: RasterIO Dataset Output: Cropped window saved to a new file

Reference Implementation:

def query_window(
    input_raster: Union[str, DatasetReader, PosixPath], 
    output_path: str,
    window_bounds: tuple, 
    close_input:bool = False
) -> None:
    """Query a cloud-optimized geotiff based on a window and save to disk
    Returns a subset of a rasterio dataset based on a window
    defined by (left,right,bottom,top) coordinates. This function
    assumes that the CRS of the bounds is based on the input dataset
    Args:
        input_raster (str, DatasetReader, PosixPath): Input rasterio dataset, which can be specified
            by a local filename or an open rasterio dataset
        output_path (str): Path to save output file
        window_bounds (tuple): Window bounding box specified by the (left, bottom, right, up)
            coordinates in order
        close_input (bool): If true, close the input dataset after processing
    Returns:
        None
    """
    # If input_raster is specified as PosixPath, convert to string explicitly
    if isinstance(input_raster,PosixPath):
        input_raster = str(input_raster)

    # Open new dataset if input is a path to a file
    if isinstance(input_raster,str):
        input_dset = rio.open(input_raster)
    else: 
        input_dset = input_raster

    assert (
        len(window_bounds) == 4
    ), "window_bounds must have exactly four items (left,bottom,right,top)"

    # Unroll window_bounds
    left, bottom, right, top = window_bounds

    # Get profile of input_dset
    input_profile = input_dset.profile

    # Specify window and query subset
    window = rio.windows.from_bounds(left, bottom, right, top, input_dset.transform)
    subset = input_dset.read(window=window)

    # Get the shape of the output subset
    number_of_bands, height, width = subset.shape

    # Get the transformation of the subset based on the window
    win_transform = input_dset.window_transform(window)

    # Update metadata for output
    output_profile = input_profile.copy()
    update_params = {
        "count": number_of_bands,
        "height": height,
        "width": width,
        "transform": win_transform,
    }
    output_profile.update(update_params)

    # Write image to output_file
    with rio.open(output_path, "w", **output_profile) as output_dst:
        output_dst.write(subset)
        output_dst.colorinterp = input_dset.colorinterp

    # Close the input dataset
    if close_input:
        input_dset.close()
        del input_dset
        gc.collect()

    return 
alronlam commented 2 years ago

@tm-jc-nacpil is my interpretation right that this function is actually applicable not only to COGs, but any rasterio dataset?

tm-jc-nacpil commented 2 years ago

@alronlam Yup! Any rasterio dataset We did this approach (pass a dataset instead of a path to open a new dataset) kasi we didn't want to open a new dataset everytime we queried the window, because the opening step is a bottleneck especially if you're opening COGs repeatedly

So usually the pattern when using it is

with rio.open(<path to raster>) as dst:
    query_window(dset, <output_path>, <grid bounds to clip>)
alronlam commented 2 years ago

@tm-jc-nacpil Ok got it. So to confirm, what value/convenience does this provide over rasterio's window function (this is what actually gets the subset right)? Is it the creation of the subset's tiff file, which involves setting the right metadata?