jararias / mpsplines

Mean preserving interpolation with splines for 1-D data series
GNU General Public License v3.0
8 stars 1 forks source link

Using min_val is very very slow #3

Open jalder-usgs opened 3 months ago

jalder-usgs commented 3 months ago

Hello, I am using the mpspline library to do 10s to 100s of thousands interpolations in parallel loops (via Dask) and I have noticed including min_val to constrain precipitation makes the routine very slow. I can do about 21k interpolations in ~50 seconds without a min_val, but if I include min_val, it just keeps going to the point I stopped it after three hours. I am not sure if the code is struggling to make second or third order adjustments. When this occurs, the CPU is maxed at 100%, so I wonder if it is stuck in an optimization loop.

We are using the library to interpolate monthly climate model time series to pseudo daily, adjusting the month boundaries of the calendar, then re-aggregating back to monthly averages. I can allow the interpolated daily values to be below zero by checking that the monthly average value is above zero (numpy where clause), but this may no longer be mean-preserving.

Is this slowdown from using min_val expected or is there any way to improve the performance so that I can use it? As it stands now, it is impractically slow for my current project.

jararias commented 3 months ago

Hi. The slowdown when a min_val is set is expected and it is not possible to anticipate by how much it will slowdown. The reason is that the splines that go below min_val are re-fitted using a constrained minimization which is an iterative numerical algorithm that may be fast or not, depending how strong is the min_val violation. I didn't find a better approach to solve the min_val cases and, honestly, I didn't spend much time with that issue because for my typical use cases I can avoid the problem doing other things.