pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.58k stars 1.07k forks source link

follow upstream scipy interpolation improvements #7704

Open dcherian opened 1 year ago

dcherian commented 1 year ago

Is your feature request related to a problem?

Scipy 1.10.0 has some great improvements to interpolation (release notes) particularly around the fancier methods like pchip.

It'd be good to see if we can simplify some of our code (or even enable using these options).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

hollymandel commented 1 month ago

I would like to work on this. For clarity, the goal is to support tensor product interpolation wherever scipy interpn supports it, and remove any no-longer-necessary dimension checks (as in #9049)?

dcherian commented 1 month ago

Nice!

the goal is to support tensor product interpolation wherever scipy interpn supports it, and remove any no-longer-necessary dimension checks

Yes, I believe so.

dcherian commented 1 month ago

We may also want to replace interp1d with our own version that class make_interp_spline as mentioned here: https://github.com/pydata/xarray/issues/9404

hollymandel commented 3 weeks ago

The 1d interpolation situation seems to have a few additional inconsistencies, maybe due to updates to the scipy interface.

  1. da.interp() cannot access any of the methods in Interp1dOptions due to the vectorizeable_only argument, as raised by #9049. However, these methods are accessible via da.interpolate_na(). I suppose this could provide a workaround if someone needed these arguments.
  2. I believe that "polynomial" interpolation (in _get_interpolator, ScipyInterpolator) is misnamed and actually refers to spline interpolation, since this is how SciPy interp1d will interpret it. Indeed polynomial interpolation does not take a degree argument. However polynomial interpolation is being implemented by the BarycentricInterpolator and KroghInterpolator which are supported.
  3. I think that the implementation of spline interpolation (e.g. SplineInterpolator, accessible via interpolate_na()) is deprecated and is now performing smoothing splines rather than spline interpolation, perhaps due to the deprecation of the argument nu.
  4. It seems better to me if interpolate_na() were accomplished via a call to da.interp(), or vice versa.

I think that these point (maybe except 4) will be resolved in the course of solving the original issue, just wanted to make sure I'm on the right wavelength.

dcherian commented 3 weeks ago

This seems right. I recommend opening as small a PR as possible for easy review rather than a large one that solves many issues. Let us know if you need help.

Does (1) seems like an easier place to start?

hollymandel commented 2 weeks ago

I am having a hard time tracing the impact of the vectorizeability of _interp_func. I wonder if there is some decorator-type behavior about detecting the vectorizeability of functions that is hidden from the stack trace?

The vectorizeable_only flag is not explicitly being passed beyond the choice of interp_class, but the choice of interp function determines the shape of the var that is passed to _interpnd.

Thanks for any insight.

dcherian commented 2 weeks ago

What happens if you set vectorizeable_only to False in interp_func?

hollymandel commented 2 weeks ago

It fails test_interpolate_vectorize in test_interp.py. Because if vectorizeable_only is False then it will use numpy.interp to interpolate. This will eventually result in an object too deep for desired array error. The reason this is mysterious to me is that this change affects the input var to the interp_func to be two-dimensional, whereas if you leave the flag in its original position the input var is one-dimensional. I cannot find where the behavior diverges prior to the call of interp_func.

dcherian commented 2 weeks ago

Sorry that is a bit gnarly, this module hasn't been touched in a while, so we lack some context.

Do (2) or (3) in your list above feel more approachable?

hollymandel commented 2 weeks ago

I have been unable to reproduce the strange behavior described in my previous comment so I think it's actually behaving reasonably. Thanks for the response. I have submitted a PR related to #9049 and will continue working on this.

hollymandel commented 5 days ago

I've moved on to the implementation of tensor product interpolators via scipy.interpolate.interpn. This has raised the following design question:

To my understanding, a few of the new interpolators (cubic and quintic tensor product splines) are "genuinely multidimensional", so an equivalent result would not be produced by applying a lower-dimensional analogue along dimensions sequentially. However missing.interp has a built in optimization to decompose interpolations in this way when possible. The optimization seems to have been introduced in response to this issue. I'm confused by the fact that overriding scipy's internal logic results in a speedup but it seems true, and there may be users relying on this optimization.

One solution would be to disable this optimization when a "genuinely multidimensional" interpolator is encountered. This would solve the issue and be backwargs-compatible. The only issue is that it would require me to figure out which interpolators are genuinely multidimensional! But the worst case scenario here is just a missed optimization and perhaps some embarrassment. My real dream would be to "pass the buck to scipy"--write things in a way that does not require any understanding of the scipy interpolators.