mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
222 stars 53 forks source link

Problem calculating the kriging error #109

Closed guiattard closed 3 years ago

guiattard commented 3 years ago

Hi,

I tried to applied your turotial to a personnal dataset (see attached both .csv dataset & script .ipynb test_scikit.zip

The coordinates of my x,y,z datasets are not normalized, but I've made the required modifications to fit your example.

The variogram and the kriging estimation looks very good. However, I cannot understand the shape of the kriging error. It looks completely offseted/disturbed.

image image

Is there any bug to plot the error or could you help me to udnerstand where the error is coming from ?

Best regards,

mmaelicke commented 3 years ago

Hey Guillaume, That, indeed, looks very strange. At first sight, I would guess that the kriging error plot is somewhat stretched. It does not really make sense to find so variable errors outside of the observation domain. I will need to have a look at your example notebook to see what's going on. In principle, the kriging error output should be of exactly same shape as the interpolation result. And both should align with the passed coordinate arrays. But I'll have a look! Thank you,

Mirko

guiattard commented 3 years ago

Thanks Mirko,

Looking forward to hearing from you,

Cheers,

mmaelicke commented 3 years ago

I got it! Really stupid mistak in my code, Thank you so much for reporting! First, find your code attached with a few changes: I switched to GSTools for Kriging during debugging, you might want to consider using that as well:

test_scikit_MM.zip


Now the details:

First issue: https://github.com/mmaelicke/scikit-gstat/blob/2819ab4a2d0ee6def3665a08e00f7d805ad3f6f8/skgstat/Kriging.py#L298

The error variance matrix was initialized as an empty array, not with NaNs. This lead to unpredictable values in the cells. If an estimation failed, i.e. due to too little neighbors, the random value in the array was used as variance. That's why the weird plot loked different on my computer.

Secondly, and way more serious: If anything went wrong during kriging, like too little neighbors or an ill-conditioned Kriging equation system (which both can happen quite frequently), the intention was, that the variance is just kept as NaN (which it was not). In your case, there were 2000 locations lacking enough neighbors and, thus the kriging function returned here: https://github.com/mmaelicke/scikit-gstat/blob/2819ab4a2d0ee6def3665a08e00f7d805ad3f6f8/skgstat/Kriging.py#L340

and never run this line in the same function: https://github.com/mmaelicke/scikit-gstat/blob/2819ab4a2d0ee6def3665a08e00f7d805ad3f6f8/skgstat/Kriging.py#L347

and that was the problem. That's why the variance field was missing exactly the number of points, which were not estimated. As the Algorithm uses flattened representations of the input data the complete output got messed up.

That was really a tricky one.

Now, the map looks like: image

I am implementing the fixes right now and will release them. I'll close the issue once the changes are online. Best

Mirko

mmaelicke commented 3 years ago

Version 0.6.4 just released.

pip install --upgrade scikit-gstat

guiattard commented 3 years ago

Thanks @mmaelicke, it is a very good news ;-) It works very well now. Very good job !

Thanks to you and your team for giving us a free, clean, and very transparent option to make some geostatistics ;-)