scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.95k stars 5.16k forks source link

griddata with multiple data values #13306

Closed max3-2 closed 1 year ago

max3-2 commented 3 years ago

Hi,

Especially in engineering, griddata is very helpful to move data from one spatial representation to another. Often, this is done with large datasets and thus is time consuming. Also often, the spatial information is the same for many datasets. Thus, a great speedup would be easily achieved when allowing griddata to accept multiple datas at once.

As far as I understand, the heavy lifting is the complex hull and weights calculation, which could be reused easily. An example I found is doing that by unrolling griddata, loosing some performance and a lot of convenience (https://stackoverflow.com/questions/20915502/speedup-scipy-griddata-for-multiple-interpolations-between-two-irregular-grids)

Any plans in this direction?

best

ianthomas23 commented 3 years ago

Have you looked at Matplotlib's triangular grid functionality (https://matplotlib.org/3.3.3/api/tri_api.html)? You can create a single Triangulation from your (x,y) data, then use this triangulation for multiple interpolations. Something like (untested):

import matplotlib.tri as mtri
triang = mtri.Triangulation(x, y)
z1 = mtri.LinearTriInterpolator(triang, z)(x1, y1)
z2 = mtri.LinearTriInterpolator(triang, zother)(x2, y2)

The Delaunay triangulation is calculated once (when creating the Triangulation), and a single TriFinder, which is used for location queries, is used by all interpolators of the same triangulation. See also https://matplotlib.org/3.3.3/gallery/images_contours_and_fields/triinterp_demo.html

pv commented 3 years ago

You can already do it by using LinearNDInterpolator / NearestNDInterpolator / CloughTocher2DInterpolator (for which griddata is a convenience wrapper) with a pre-constructed triangulation. https://docs.scipy.org/doc/scipy-1.5.4/reference/generated/scipy.interpolate.CloughTocher2DInterpolator.html#scipy.interpolate.CloughTocher2DInterpolator

max3-2 commented 3 years ago

Thanks for the input. I do know that it is possible to achieve this with various methods - preferably using the scipy „backend“. In my opinion, however, this is a quite common case and might be an interesting upgrade to the griddata wrapper so its easier to use for high-level access, e.g. when switching interpolation schemes.

pv commented 3 years ago

Yes, it's true griddata should accept a triangulation as "points".

It should also be handle multiple datapoints if they are stacked to an array of size (n, ...) --- that probably works already, but documentation doesn't explain it.

max3-2 commented 3 years ago

I’ll give it a closer look. Guess the tag should be scipy.interpolate?

freshlu11 commented 2 years ago

If you are aiming to use griddata(points,values, method=linear') for interpolating different values from same points to same grid, and want to calculate Delaunay triangulation and weights only once, you can use LinearNDInterpolator, and the values' shape need to be like (number of points, dim of value types); for example:

# here points is original x,y; z1,z2 are z1 = f1(x,y); z2 = f2(x,y)
# and we want  to  interpolate  z1_new, z2_new on grid (lons, lats)
# first stack two types of data, z1, z2, both 2-D array
stack_array = np.vstack((z1.flatten(),z2.flatten())).T
ip = LinearNDInterpolator(points, stack_array)
valuesi = ip((lons, lats))
z1_new=valuesi[...,0]
z2_new=valuesi[...,1]

why we can do this, let's look at the source code in interpnd.pyx

class LinearNDInterpolator(NDInterpolatorBase):
    ...
    def _do_evaluate(self, const double[:,::1] xi, double_or_complex dummy):
    ...
                # here nvalues is stack_array.shape[1], just means two different values
                # to calculate z1_new and z2_new, the isimplex and c(the weights of Linear barycentric interpolation) only once for every grid point.
                for j in range(ndim+1):
                    for k in range(nvalues):
                        m = simplices[isimplex,j]
                        out[i,k] = out[i,k] + c[j] * values[m,k]

        return out
hovavalon commented 1 year ago

I am aiming to use griddata(points,values, method='cubic') with for interpolating different values from same points to same grid, is there a good way to do that? I did some digging in CloughTocher2DInterpolator's code, and it seems that there is no way to skip finding the simplices indicies (which are the same if only the values changed), and also that given an instance of CloughTocher2DInterpolator there is no way to "reset" its values. What is the fastest way to perform this operation?

PS: I have tried updating the values of an existing instance of CloughTocher2DInterpolator, by setting the values attribute and recalculating the grads attribute, but the grads calculation was not consistent with the grads of an instance initially created with these values, Why does this happen? I use the same tolerance and number of iterations.

ev-br commented 1 year ago

This seems to work for values with trailing dimensions already. Starting from the docstring example, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.CloughTocher2DInterpolator.html :

In [55]: CloughTocher2DInterpolator(list(zip(x, y)), z)(0, 0)
Out[55]: array(0.08449745)

In [56]: z2 = np.c_[z, z]

In [57]: CloughTocher2DInterpolator(list(zip(x, y)), z2)(0, 0)
Out[57]: array([0.08449745, 0.08449745])

Updating the values sounds similar to what was done for the RGI: https://github.com/scipy/scipy/pull/17230 I don't think griddata et al supports this.

hovavalon commented 1 year ago

Thanks! The problem is that my values are changing online, and I don't know all of them when I first create the interpolator, hence I am forced to perform the weights calculation each time my values are updated. The pull request you mentioned is indeed relevant, maybe I will try to do something similar with CloughTocher2DInterpolator.

hovavalon commented 1 year ago

I have created a PR which solves my problem, which includes some basic tests and doesn't break the existing API. I would appreciate a review and perhaps some guidance about the technicalities such as documentation and other things I might have missed.

ev-br commented 1 year ago

The refactor of gh-18376 makes it easier to deal with changing values, which was the last u implemented item from this issue. Closing as completed, thanks all.