pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
344 stars 95 forks source link

Make a Swath definition subclass for interpolated data #552

Open mraspaud opened 10 months ago

mraspaud commented 10 months ago

Feature description

A lot of data we use in our team has geolocation data provided as with tiepoints to interpolate. Python-geotiepoints works as expected for this and it's easy to create a swath definition base on dask arrays (so that the navigation data can be interpolated chunk-wise).

However, for some applications it would be preferable to be able to have access to the interpolation function directly in the swath definition. For example, for computing a granule ring or for getting just a few points out of the data, it would be desirable to interpolate on a case by case basis.

For this, I propose creating a subclass of the Swath definition that would take on initialisation the lower-resolution lon and lat arrays along with a function to call to get interpolated values for a given position in the array. From there, re should be able have a custom get_lonlat method that does the interpolation itself if the number of requested points is smaller than say 100 points.

BENR0 commented 8 months ago

Related: #476 and https://github.com/pytroll/satpy/pull/2283

mraspaud commented 8 months ago

I've been thinking about this a lot during the last couple of days, and I think the approach presented here is actually not optimal. Here is why: First, I'm realizing that we have different ways of interpolating data, depending on the organisation of the tie point (gridded vs non-gridded), and the interpolator chosen. This means that we should have either a very versatile interpolated swath def class, that knows how to handle different interpolation types, or have a unified api for interpolation (which I think would be difficult in the first place, just because of the different use cases). Second, in my experience, the only case we need to interpolate a subset of the full lon/lats arrays is just to create a granule/swath boundary/ring/footprint. Third, there is another use case for keeping track of the tie points, and that is to be able to save the inside a geotiff image (mentioned in #476). For this use case, the gcps could just be an attribute of the new class.

So a simpler solution in my opinion is to let the developer/user use relevant interpolator for its use case, and provide the corresponding dask array. To develop, I have to go in satpy territory here, but I think it's necessary to see the bigger picture. Most of the data we are using in satpy/pytroll provides a lot of metadata, among which the boundary/ring/footprint is provided. IMO, this should be the primary source, instead of computing it. For the data that don't have this data and where the geolocation is provided as tie points, I believe it is best to let the reader of that data determine the best/simplest way to compute the boundary (eg maybe it's as easy as passing the edge tiepoints, or maybe extrapolation is needed). Having a generic way would add a lot of unneeded complexity I think. So, letting the dev of the reader take care of how to interpolate the data and how to find the boundary is I think the best option.

Hence, the solution I suggest now is:

  1. A new SwathDefinition subclass that contains a precomputed boundary
  2. A new SwathDefinition subclass that contains gcps
  3. ...or mixins...
djhoese commented 8 months ago

I think the missed use case here is someone who wants to use numpy arrays without dask involved. Obviously that's not our use case anymore as Satpy developers/users. But I think it is still something to keep in mind.

I haven't re-read the original post, but I remember it being mentioned somewhere that the swath definition could take a callback/interpolator. I guess this would be what you mentioned with:

have a unified api for interpolation (which I think would be difficult in the first place, just because of the different use cases)

But I mean anything can be a callback with two inputs (lons, lats), right? Not that I want to program this myself, but it would be an option that I'm not completely against if it provides something useful...

I think a restructured SwathDefinition like the one I'd like in Pyresample 2.0 with defined interfaces (via Protocols) could mean that someone wanting to do what you originally requested wouldn't have that hard of a time doing it. If we broke down the use cases of the SwathDefinition into .coords, .boundary, etc then anyone could make their own class as long as they implemented these things. The object wouldn't even have to be created with SwathDefinition as a base class.

Bottom line: I'm OK not implementing a specialized class that does the interpolation. I just wanted to point out the solutions for it that I'm not completely against.

mraspaud commented 8 months ago

Thanks for reminding me of that this use case. It is of course valid, but I think it will be a third subclass/mixin then.