r-spatial / gstat

Spatial and spatio-temporal geostatistical modelling, prediction and simulation
http://r-spatial.github.io/gstat/
GNU General Public License v2.0
196 stars 50 forks source link

Using predict.gstat function with nsim>0 causes segfault error #141

Open lea-c opened 6 months ago

lea-c commented 6 months ago

Hi,

I've been encountering a problem I don't understand with the predict.gstat function.

I would like to use it to interpolate residuals using a cokriging model. Running the function with nsim=0 works fine but if I increase it to a larger number (even a small one like 10, although I need to run it eventually with nsim=300) it automatically causes the following error, even on computers with large computing power and memory: *** caught segfault *** address 0x38, cause 'memory not mapped'

Could you help me identify the issue ? I checked for duplicate coordinates with zerodist, for unmatching CRSs, I tried with many values of maxdist and nmax, as well as with fewer points, so I'm at a loss here.

Below is a minimal example, and here is the link to download data: https://filesender.renater.fr/?s=download&token=911d7d4c-6bf5-4bec-a316-8adc5417b9d8 Thank you very much in advance. I remain available for any complementary information.

library(gstat)

# Load variogram load("./var_awc.RData") awc_sem <- cv.fit

# Load data to predict val_sp <- readRDS("./point_data.Rdata")

# Works fine gstat:::predict.gstat(awc_sem, newdata = val_sp)

# Crashes gstat:::predict.gstat(awc_sem, newdata = val_sp, nsim=10, maxdist=500)

lea-c commented 5 months ago

Hi, I've analysed the problem more in depth and it seems to stem from a projection error. The points in val_sp are projected too far away from the points used to fit the variogram and it may cause a memory overload ? This is my hypothesis here. Unfortunately, I still have not been able to solve the issue.

Fyi, here is a dataset for which the 300 simulations can be carried out: https://filesender.renater.fr/?s=download&token=c23e2359-6334-42ae-b425-2127bd302b63 It contains a lot more points that val_sp so this is in line with the fact that the number of points in val_sp plays is not responsible here.

This dataset has the same CRS and coordinates of similar magnitude to val_sp's, but it does not look like their extents overlap. Weirdly, when I plot grd and then val_sp, they look like they are in the exact same projection. grd <- readRDS("./fine_data.Rdata") plot(grd) plot(val_sp, add=T) But when I plot val_sp before grd, an inconsistency appears, and the plots don't overlap. plot(val_sp) plot(grd, add=T)

I've dealt with many CRSs issues before but I cannot figure this one out. I've reprojected the datasets over and over to make sure the CRSs match, but without success. Could you just help me make sure that val_sp and grd align? This will certainly solve the issue. Thank you very much for your help! And again, I remain available if you have any question

edzer commented 5 months ago

Thanks for the update; I've looked at the issue and can reproduce the error you're getting. I can see where it happens when run in a debugger, but haven't been able to identify its cause.

lea-c commented 5 months ago

Thank you very much for looking into this problem! Here is another update: actually I don't think this is a CRS issue. I found a dataset very similar to mine and with the same CRS which does not cause any issue. You can download the new data here... https://filesender.renater.fr/?s=download&token=7bc3178c-2c7a-4fd7-89b4-b15e2afcb419 ...and run the following code:

library(sp)
library(gstat)

# Load variogram
variogram <- readRDS("./variogram.Rdata")

# Load the two datasets
spatial_points_problem <- readRDS("./spatial_points_problem.Rdata")
spatial_points_working <- readRDS("./spatial_points_working.Rdata")

# Compare CRS
proj4string(spatial_points_problem)
proj4string(spatial_points_working)

# Compare extents
plot(extent(spatial_points_problem@bbox), col="red")
plot(extent(spatial_points_working@bbox), col="blue",add=T)

# Check for duplicates
zerodist(spatial_points_problem, z=0.9)
zerodist(spatial_points_working, z=0.9)

# Compare number of points - the only difference here but again, it should not cause any issue
print(nrow(spatial_points_problem))
print(nrow(spatial_points_working))

# Plot the two datasets
plot(spatial_points_problem, col="red")
plot(spatial_points_working, col="blue", add=T)

# Works 
gstat:::predict.gstat(variogram, newdata = spatial_points_working, nsim=10) # I get a warning here due to the fact that I had to modify the dataset a bit but I don't think this is important

# Crashes
gstat:::predict.gstat(variogram, newdata = spatial_points_problem, nsim=10)  # crashes also with various maxdist and nmax values

This is very weird...

lea-c commented 5 months ago

Hi again, I tried to go further in diagnosing the problem and identified the points that may be responsible for the error. Below is a figure where:

The predict.gstat function runs fine with nsim>0 on the green points: however, it crashes as soon as a red point is added.

Also, I have tried to debug the function myself but I have not been able to run it outside the package, because of the C calls.

I hope this piece of additional information is helpful

Best regards

edzer commented 4 months ago

This might be a clue:

> zerodist2(spatial_points_problem, variogram$data$awc_0_30_m$data)
     [,1] [,2]
[1,]  519 1711
> zerodist2(spatial_points_working, variogram$data$awc_0_30_m$data)
     [,1] [,2]
edzer commented 4 months ago
g = gstat:::predict.gstat(variogram, newdata = spatial_points_problem[-519,], nsim=10)

works!

lea-c commented 4 months ago

Sorry for my late reply. It works, thank you so much for your time and effort !

edzer commented 4 months ago

Thanks; note that this is a work-around, the bug is still present.

lea-c commented 3 months ago

Yes, I've noticed: for now I just removed tens of points from my dataset (larger than the one I sent to you). I would be curious to understand what happens.