treverhines / RBF

Python package containing the tools necessary for radial basis function (RBF) applications
MIT License
215 stars 50 forks source link

Improve speed of RBF-interpolation #25

Open osu1191 opened 2 years ago

osu1191 commented 2 years ago

Hello Mr. Hines,

I am trying to interpolate a scattered set of data points in 3D (x, y, z, phi) to a regular rectangular 3D grid in a "smooth" manner, i.e. with derivative continuity Using RBFInterpolant, I tend to get pretty accurate performance, but I wanted to know how to improve the speed of the process. My scattered dataset is around 10^4 x, y, z values (x_obs = (52850,3) ; u_obs = (52850,1)) and I want to interpolate them to 10^6 regular grid points (x_itp = 100^3), i.e. a mesh of 100 grid points on each of x, y and z axis.

Currently, it's taking more than 30 mins to perform the interpolation process. I observed that you are using K-NN. Is it this part that's taking the maximum time? Or is it the LU decomposition part? What is the bottleneck? Could you suggest me ways to speed up the performance?

A separate issue is related to memory. When used for x_obs = (225000,3) scattered points, I am getting the error :

_'numpy.core.exceptions.MemoryError: Unable to allocate 377. GiB for an array with shape (225044, 225044) and data type float64'

Is there a way to bypass this?

Thanks in advance, Paul

treverhines commented 2 years ago

Can you try creating the RBFInterpolant with neighbors=50? When you give the neighbors argument, it creates a smaller interpolant for each evaluation point using that many nearest observation points. The downside is that there will be discontinuities in the interpolant wherever the neighborhood changes. Those discontinuities will get smaller with larger value for neighbors.

osu1191 commented 2 years ago

Screen Shot 2022-07-29 at 7 49 42 AM Screen Shot 2022-07-29 at 10 47 32 AM

Hello Mr. Hines, Thanks for the quick response. I have two results here for a small test scattered dataset (x_obs = 9000,3). It's a difference plot of the interpolated and target values. The LHS is obtained using the default parameters. It takes 14s for the interpolant generation, and 78s for evaluating the values (u_itp) at 10^6 locations. RHS is done using "neighbors = 50". This takes 4s for the interpolant, however 278s for evaluating the values. The accuracy deteriorates as you mentioned. Can you tell me the default value of neighbors parameter? Also, can you tell me why the time for evaluation of "u_itp" increases on reducing the neighbors?

Thanks once again, Paul

treverhines commented 2 years ago

Suppose you have N observation points. If neighbors is not given, the RBF interpolation coefficients are found during __init__ by solving a roughly (N, N) system of equations, which can take a while when N is large. The neighbors argument was added to perform faster interpolation when N is large. If neighbors is specified, then no RBF interpolation coefficients are determined at __init__. Instead, the algorithm waits until you have specified evaluation points in __call__ and then it solves for the coefficients for an interpolant using only the observation near those evaluation points. Suppose neighbors is set to K and you have M different evaluation points. This means you would solve M (K, K) systems of equations. If you reduce K, then the system are solved faster, resulting in faster evaluation times.

You may also find more information here https://rbf.readthedocs.io/en/latest/interpolate.html

osu1191 commented 2 years ago
Screen Shot 2022-07-22 at 5 41 42 AM

Thank you so much for the explanation and the link. So, as I understand, this is a difference between global and local interpolation. Mentioning "neighbors" causes local interpolation, which is faster but also creates discontinuities in the interpolated function. I have attached a 2D figure of a model scattered grid in my case. As you see, the grid is dense around the nuclei and gets sparse as I go far away. When doing local interpolation, I believe I will have issues for the evaluation points at these sparse regions of observables, as the neighbors are not really neighbors for these peripheral evaluation points. Can you suggest me a way to deal with this issue and improve the discontinuity/accuracy? For example, I was thinking optimizing the "sigma" or the "eps" parameters?

Thanks once again for the fast response, Paul

treverhines commented 2 years ago

Your understanding of how neighbors works is right. I also see your point about how nearest neighbors is not always the best strategy for picking observations points for local interpolation.

Is there any noise in your observations that would cause you to not want to fit the data exactly? If not, then you should leave the sigma value at 0. Also, the shape parameter eps has no effect for the default value of phi, which is "phs2". You may get smaller discontinuities if you use "phs1" for phi, which is also invariant to eps.

To get better neighborhoods for local interpolation, maybe you can try some sort of transformation that makes the observation points more evenly distributed. For example, warp each point's distance to the center of the grid by log(r + 1).

osu1191 commented 2 years ago

Mr. Hines, thank you once again for your suggestion. I have to learn about the concept of "warping", as I am not familiar with it. I will try out your suggestion. Best regards, Paul.