kbarbary / Dierckx.jl

Julia package for 1-d and 2-d splines
Other
157 stars 30 forks source link

Spline2D fails with large number of unstructured points #66

Closed joeynelson closed 4 years ago

joeynelson commented 4 years ago

Thanks for the great work in wrapping this library. For the work I do spline surface fitting is very useful.

I'm working with a noisy data set, and trying to generate a roughly 20 x 40 fit with kx=3 control points to a data set with a million or more points. When I call Spline2D, it fails trying to convert an out of range lwrk1=2495786006 to an Int32.

In fact the data set is gridded, but with missing values. The missing values (coded as NaNs) seem to cause the 2D Z version to fail (not that surprising). It does work well if I pass in a contiguous region, even with a few million points.

One thought is that lwrk1 is being calculated overly conservatively for the 1D Z case.

Another question is is there any formula for the the returned grid size given the input size and the s parameter? Ideally, I would like to pass in the the grid I want my control points on, but I'm not sure that this is supported by the underlying library.

kbarbary commented 4 years ago

Yeah I don't think it's possible to pass in the grid you want, but you could check out the fortran-level documentation here: https://github.com/DaanVanVugt/libdierckx/blob/master/surfit.f.

lwrk1 is calculated here: https://github.com/kbarbary/Dierckx.jl/blob/master/src/Dierckx.jl#L691-L710 I copied it from the scipy interpolate wrapper, so I'm not really that familiar with how conservative it is.

Aside from revising that calculation, one option would be to allow a user-supplied lwrk1 value or to cap it at typemax(Int32) and hope for the best :crossed_fingers:, hopefully the Fortran library will check that it is big enough and raise an error if not.

joeynelson commented 4 years ago

Thanks for the suggestions.

I believe you are right about the grid. Paul Dierckx's Curve and Surface Fitting with Splines mentions that this isn't supported, but gives a number of a approaches to work around it. I ended up being able to fill the empty values with reasonable interpolated or extrapolated values. I later mask out areas where the fit was based on extrapolation.

I did do some testing with changing the value of lwrk1, and it looks like the fortran code does requires the large wrk1 buffer, and doesn't check this to verify, instead it writes out of bounds if it is not large enough.

For the smoothing value, s. Dierckx suggested playing around until you get the size of output you want.

I have also found two other spline surface fitting libraries both in C++. One is in ALGLIB, which looks promising. The other is part of Geometric Tools, which does give a nice explanation of the math, but doesn't seem to offer much beyond Dierckx routines.