Open redhog opened 3 years ago
It can also end up being nan when very close / on top of input data where it should be 0.
Hey, thank you two for reporting!
Can any of you confirm, that is only happening very close / at observation locations? I am just wondering if we are talking about a bug in the implementation, or numerical issues that can easily be rounded away...
Maybe, at an example where this is happening, you can quickly add a print(b, l)
in front of this line:
I am just wondering if it's the l[-1]
, so the last element in the weights array (mu - the lagrange multiplier) that becomes really small close to observation points that is causing the problems.
It only happens close to/on top of observations that I can see. We can't really guarantee that we don't try to call transform() on the exact same coords as some observation - it's quite likely to happen even.
OK, then as a first shot I would guess that when an observation location is in the grid passed to transform, we end up with a 0
in the kriging matrix. This will lead to a singular matrix, which can't be solved. The class does catch these exceptions and actually counts their occurrences in an internal attribute (Kriging.singular_error
). In case of an exception, np.nan
is returned and Kriging.sigma
is not changed.
The attribute is instantiated here:
https://github.com/mmaelicke/scikit-gstat/blob/ef963b872f92436940361f8e5682a1b6905afa22/skgstat/Kriging.py#L286
Thus, we get a nan
instead of a zero. We have to keep in mind, that there are other causes for a singular matrix and therefore I would not replace the nan
with a zero by default, as actually, nothing was calculated.
Would it be a possible pathway for you to use the class as is and replace the result at the indexes of input coordinates (if they match) with 0?
For the very small values, we can introduce a default precision and round results, i.e. up to 6 decimal places, as 10**-15
precision does not really make sense for an interpolation. As the last step, you could then check if there are any NaNs in sigma
left, that should not be there, or any negative sigmas, that lied outside the chosen precision?
I think it would be better to do this kind of cleanup in scikit-gstat, but yes, given your explanation, we can't just replace all nans with zeros. But couldn't we use the distance matrix and check for distance < 1e-14 or something and replace all those results?
That sounds very sensible. Maybe an even larger threshold. We should set this as an attribute to the Kriging class with a default value like that. Besides the technical implications of rounding very small distances away, there is also a scientific one. One could question an interpolation estimate resulting from distances that are by many magnitudes smaller than the spatial resolution of the Variogram. From my point of view it does not make sense to discriminate 0.000001 m distances for Kriging, if the Variogram bins might use 1 meter lags. So, to give the user the possibility to specify the precision of the distance matrix, makes sense anyway, I think.
OrdinaryKrigin.sigma
can sometimes end up with very small negative numbers, like-1.514380e-15
very close to positions with original data. This does have some practical consequences since we're operating innp.float64
space: SinceOrdinaryKrigin.sigma
is defined assigma^2
, it is often natural to take the square root (to get values with a more sensible unit), givingNaN
in this case.