modflowpy / flopy

A Python package to create, run, and post-process MODFLOW-based models.
https://flopy.readthedocs.io
Other
521 stars 314 forks source link

feature: speed-up resample_to_grid #2169

Open cneyens opened 6 months ago

cneyens commented 6 months ago

Is your feature request related to a problem? Please describe.

The resample_to_grid method allows for the resampling of raster data to the MODFLOW grid. Currently supported raster formats are GeoTiff, ASCII Grid (ESRI ASCII), and Erdas Imagine .img, per the notebook describing its functionality. These formats are rectilinear and have a regular x and/or y spacing. The interpolation algorithm for bilinear, bicubic and nearest-neighbour interpolation which is used in the underlying code (scipy.interpolate.griddata) is intended for unstructured data. We found that the scipy.interpolate.interpn algorithm for rectilinear grids outperforms the former, especially for bilinear and nearest-neighbour interpolation, with a speed-up of several orders of magnitude on my machine (albeit from a rudimentary benchmark; see notebook attached below). For bilinear interpolation from a 1x1 m input raster, interpn is about 10 000 times faster.

Describe the solution you'd like

When the method in resample_to_grid is nearest, bilinear or cubic and the raster input format is rectilinear, use scipy.interpolate.interpn instead of scipy.interpolate.griddata.

Describe alternatives you've considered

The current method scipy.interpolate.griddata, which is slower for rectilinear raster data.

Additional context

Notebook with rudimentary benchmarking for relatively fine structured and vertex grids and rectangular input data at different resolutions, on a Windows 10 i9 3.50 GHz machine using flopy v3.6.0 and scipy v1.13.0. This needs verification.

resample_benchmark.zip

dbrakenhoff commented 6 months ago

Sounds like a good idea :)!

I had to modify your code a bit to get the same result between both optimization methods, but after that change they produce nearly the same result: image

I changed the following, not sure if these are logical changes but at least they produce the same output: Flipping y in create_raster and changing the dimensions of v:

    y = np.arange(0, Ly + 10, delta)[::-1]
    # ...
    v = np.random.rand(ny, nx)

Switching y, and x in interpn:

ip1 = sp.interpolate.interpn((y, x), v, (yc, xc), method=m)

Weirdly, i get one cell where the difference != 0 but i guess that's some edge case that both methods handle differently...? See bottom left corner: image

cneyens commented 6 months ago

Yes, interpn requires marginal vectors instead of grid points, so your change is necessary to obtain similar (i.e. correct interpolation) results. I only tested for performance since I do not expect exactly the same results from the different algorithms.

I can reproduce the single cell difference in the lowerleft corner for the vertex grid using nearest-neighbour interpolation. I guess it's indeed an edge case depending on the geometry of the voronoi grid. Note that for bilinear interpolation, the results do differ more significantly:

resample_linear_vertex