pjhartzell / raster-footprint

Create GeoJSON geometries that bound valid raster data
https://raster-footprint.readthedocs.io
Apache License 2.0
6 stars 0 forks source link

Use rasterio's `read_masks` method #2

Closed pjhartzell closed 1 year ago

pjhartzell commented 1 year ago

I've found that using rasterio masks is much faster (an order of magnitude plus) than the data_mask class method. I think we should default to using rasterio's read_masks method.

A few possible ways to make this change:

  1. Move mask creation outside the core RasterFootprint methods into the alternative constructors where we are reading the data file with rasterio. The constructor's data_array argument would be replaced with something like mask_array. This is clean, but requires users to build their own mask array if they are directly using the class constructor. We can retain the existing mask creation logic as a free function utility for this purpose, but we are asking users to do more work in certain cases.
  2. Add an is_mask flag to instruct the class to bypass mask creation (any no_data values would be ignored) and use the data_array argument as the mask.

I'm open to other ideas.

pjhartzell commented 1 year ago

I'm implementing option 1 from above. I'll add a from_numpy_array alternative constructor as an assist for those cases where you want to build a footprint from an existing data array.

Performance note: For a single band ESA Worldcover raster, which is quite large at 36000x36000 pixels, using rasterio's mask reading takes about 1.3 seconds. Using the existing numpy-based mask generator takes about 45 seconds.