natcap / invest

InVEST®: models that map and value the goods and services from nature that sustain and fulfill human life.
Apache License 2.0
165 stars 67 forks source link

HRA: edge case where aligned datasets have different sizes #1312

Closed davemfish closed 1 year ago

davemfish commented 1 year ago

HRA accepts raster & vector inputs and aligns them all onto a raster stack with a target bounding box of the union of all the inputs.

For the rasters, we use align_and_resize_raster_stack.

For the vectors, we create an empty raster to rasterize onto, which has its size & geotransform determined by the same target_bounding_box and pixel_size passed to align_and_resize. Specifically, the size of the target raster is determined by,

n_cols = int(numpy.ceil(
    abs((bbox_maxx - bbox_minx) / target_pixel_size[0])))
n_rows = int(numpy.ceil(
    abs((bbox_maxy - bbox_miny) / target_pixel_size[1])))

Rounding up with numpy.ceil makes sense...if the original vector includes geometries extending right out to the edge of target_bounding_box, then we don't want a sub-pixel gap between the last row/col of pixels and the edge of the target_bounding_box.

For the raster inputs, which pass through align_and_resize and then warp_raster, warp_raster does a similar thing to reconcile a target_bb with width & height that may not be a multiple of the target_pixel_size , but just slightly different:

    # sometimes bounding boxes are numerically perfect, this checks for that
    x_residual = (
        abs(target_x_size * target_pixel_size[0]) -
        (working_bb[2] - working_bb[0]))
    if not numpy.isclose(x_residual, 0.0):
        target_x_size += 1
    y_residual = (
        abs(target_y_size * target_pixel_size[1]) -
        (working_bb[3] - working_bb[1]))
    if not numpy.isclose(y_residual, 0.0):
        target_y_size += 1

    if target_x_size == 0:
        LOGGER.warning(
            "bounding_box is so small that x dimension rounds to 0; "
            "clamping to 1.")
        target_x_size = 1
    if target_y_size == 0:
        LOGGER.warning(
            "bounding_box is so small that y dimension rounds to 0; "
            "clamping to 1.")
        target_y_size = 1

    # this ensures the bounding boxes perfectly fit a multiple of the target
    # pixel size
    working_bb[2] = working_bb[0] + abs(target_pixel_size[0] * target_x_size)
    working_bb[3] = working_bb[1] + abs(target_pixel_size[1] * target_y_size)

This is effectively the same as the HRA method -- expanding the target bounding box as needed to accomadate an extra row or column of pixels. But there's an edge case where numpy.isclose(x_residual, 0.0) is True, so no extra column is added during align_and_resize. But the extra column is added in HRA because it simply takes numpy.ceil.

My inclination would be to change the logic in HRA and use the approach found in warp_raster instead of using numpy.ceil

davemfish commented 1 year ago

Oh, and this manifests as, an Access window out of range in RasterIO() error from gdal.ReadAsArray

2023-05-11 13:39:59,934 (pygeoprocessing.geoprocessing_core) geoprocessing.distance_transform_edt(2501) INFO Distance Transform Phase 2
2023-05-11 13:40:02,787 (osgeo) utils._log_gdal_errors(97) ERROR [errno 5] C:/Users/gabriel.perilla/Documents/iclei/invest/riskHabitat\intermediate_outputs\decayed_edt_aeropuerto_mon.tif, band 1: Access window out of range in RasterIO().  Requested(2560,0) of size 233x256 on raster of 2792x2972.
2023-05-11 13:40:02,789 (taskgraph.Task) Task.add_task(706) ERROR Something went wrong when adding task Calculate E score for sierra chiquita / aeropuerto (72), terminating taskgraph.
Traceback (most recent call last):
  File "taskgraph\Task.py", line 674, in add_task
  File "taskgraph\Task.py", line 1093, in _call
  File "natcap\invest\hra.py", line 2031, in _calc_criteria
TypeError: 'NoneType' object is not subscriptable
2023-05-11 13:40:02,792 (natcap.invest.utils) utils.prepare_workspace(166) ERROR Exception while executing Habitat-Risk-Assessment
Traceback (most recent call last):
  File "natcap\invest\utils.py", line 164, in prepare_workspace
  File "natcap\invest\ui\model.py", line 1632, in _logged_target
  File "natcap\invest\hra.py", line 591, in execute
  File "taskgraph\Task.py", line 674, in add_task
  File "taskgraph\Task.py", line 1093, in _call
  File "natcap\invest\hra.py", line 2031, in _calc_criteria
TypeError: 'NoneType' object is not subscriptable
davemfish commented 1 year ago

I just noticed pygeoprocessing.create_raster_from_bounding_box exists and is basically identical to hra._create_raster_from_bounding_box, which is the function doing the n_cols and n_rows logic with numpy.ceil.

Should HRA start using pygeoprocessing.create_raster_from_bounding_box and should that function be where this fix gets applied?

I suppose it's not a bug when create_raster_from_bounding_box is used in isolation. The bug is in HRA which expects that raster to have the same size as one created with warp_raster using the same target_bounding_box and pixel_size. Nevertheless, maybe it is good for pygeoprocessing to be internally consistent across those two functions. What do you think @phargogh ?

phargogh commented 1 year ago

I agree, consistent behavior should be the priority here! I'll make an issue for this in pygeoprocessing.

The function pygeoprocessing.create_raster_from_bounding_box was added to pygeoprocessing as a result of it being useful in HRA, but I must have forgotten to update the model with the new function once we updated to pygeoprocessing>=2.4.0.