mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
225 stars 54 forks source link

Negative sigma #85

Open redhog opened 3 years ago

redhog commented 3 years ago

OrdinaryKrigin.sigma can sometimes end up with very small negative numbers, like -1.514380e-15 very close to positions with original data. This does have some practical consequences since we're operating in np.float64 space: Since OrdinaryKrigin.sigma is defined as sigma^2, it is often natural to take the square root (to get values with a more sensible unit), giving NaN in this case.

eddjharrison commented 3 years ago

It can also end up being nan when very close / on top of input data where it should be 0.

mmaelicke commented 3 years ago

Hey, thank you two for reporting!

Can any of you confirm, that is only happening very close / at observation locations? I am just wondering if we are talking about a bug in the implementation, or numerical issues that can easily be rounded away...

Maybe, at an example where this is happening, you can quickly add a print(b, l) in front of this line:

https://github.com/mmaelicke/scikit-gstat/blob/ef963b872f92436940361f8e5682a1b6905afa22/skgstat/Kriging.py#L478

I am just wondering if it's the l[-1], so the last element in the weights array (mu - the lagrange multiplier) that becomes really small close to observation points that is causing the problems.

redhog commented 3 years ago

It only happens close to/on top of observations that I can see. We can't really guarantee that we don't try to call transform() on the exact same coords as some observation - it's quite likely to happen even.

mmaelicke commented 3 years ago

OK, then as a first shot I would guess that when an observation location is in the grid passed to transform, we end up with a 0 in the kriging matrix. This will lead to a singular matrix, which can't be solved. The class does catch these exceptions and actually counts their occurrences in an internal attribute (Kriging.singular_error). In case of an exception, np.nan is returned and Kriging.sigma is not changed. The attribute is instantiated here: https://github.com/mmaelicke/scikit-gstat/blob/ef963b872f92436940361f8e5682a1b6905afa22/skgstat/Kriging.py#L286 Thus, we get a nan instead of a zero. We have to keep in mind, that there are other causes for a singular matrix and therefore I would not replace the nan with a zero by default, as actually, nothing was calculated.

Would it be a possible pathway for you to use the class as is and replace the result at the indexes of input coordinates (if they match) with 0? For the very small values, we can introduce a default precision and round results, i.e. up to 6 decimal places, as 10**-15 precision does not really make sense for an interpolation. As the last step, you could then check if there are any NaNs in sigma left, that should not be there, or any negative sigmas, that lied outside the chosen precision?

redhog commented 3 years ago

I think it would be better to do this kind of cleanup in scikit-gstat, but yes, given your explanation, we can't just replace all nans with zeros. But couldn't we use the distance matrix and check for distance < 1e-14 or something and replace all those results?

mmaelicke commented 3 years ago

That sounds very sensible. Maybe an even larger threshold. We should set this as an attribute to the Kriging class with a default value like that. Besides the technical implications of rounding very small distances away, there is also a scientific one. One could question an interpolation estimate resulting from distances that are by many magnitudes smaller than the spatial resolution of the Variogram. From my point of view it does not make sense to discriminate 0.000001 m distances for Kriging, if the Variogram bins might use 1 meter lags. So, to give the user the possibility to specify the precision of the distance matrix, makes sense anyway, I think.