r-spatial / gstat

Spatial and spatio-temporal geostatistical modelling, prediction and simulation
http://r-spatial.github.io/gstat/
GNU General Public License v2.0
194 stars 49 forks source link

krige function shift y-axis coordinates using sf and stars #123

Closed alexyshr closed 5 months ago

alexyshr commented 1 year ago

Dear Dr. Edzer,

I hope your Python learning curve goes fast to improve your wonderful R packages. I knew about your new academic path some weeks ago by checking your GitHub account. Wonderful!

I am simulating a spatial random process from the Gaussian model given an exponential covariance model. Then I am extracting a sample to perform geostatistics and check the resulting covariance vs. the real one. I am using only sfand stars. The issue is that the final map is shifted on the y-axis. I did it first with no CRS (lat, long numbers), and the displacement was even bigger, then I changed to projected coordinates, and the issue persist!

Thank you for your support!

Best regards

 library(maps)
 library(mvtnorm)
 library(gstat)
 library(sf)
 library(ggplot2)

 # Covert map to sf
 ohio = sf::st_as_sf(maps::map("state", "ohio", plot = FALSE, fill = TRUE))
 sf::st_crs(ohio) <-  4326
 ohio = sf::st_transform(ohio, crs=3637) #NAD83(NSRS2007) / Ohio North
 nx = 30
 ny = 30
 ohio |> sf::st_bbox() |> stars::st_as_stars(nx = nx, ny = ny) -> grd
 xg2 = stars::st_get_dimension_values(grd, "x", cell_midpoints = T) # returns cell center coordinates
 yg2 = stars::st_get_dimension_values(grd, "y", cell_midpoints = T) # returns cell centers coordinates

 # Plot sf and stars (as point sf)
 ggplot2::ggplot() +
   geom_sf(data=ohio) + 
   geom_sf(data=sf::st_as_sf(sf::st_crop(grd, ohio), as_points=T))


 # Locations of grid points
 ohgrid.locs2 = sf::st_coordinates(grd)
 ntot2=dim(ohgrid.locs2)[1]  # it has 900

 # Matrix of pairwise distances using sf
 distmat2=as.matrix(dist(ohgrid.locs2))

 #maximum distance
 max_dist2 = max(distmat2)

 # covariance matrix exponential model
 # covariance matrix in terms of the exponential model.
 theta1=50000 #2 # we are making these values up in  (TRUE SILL)
 theta2=111000 #1 order to specify a true covariance matrix (TRUE RANGE). 
 Sig2 = theta1*exp(-distmat2/theta2)

 # To visualize the spatial structure you have specified in this covariance matrix, 
 # plot the 'true' covariogram at 20 equally spaced spatial lags:
 ggplot() +
   xlim(0, max(max_dist2)) +
   geom_function(aes(colour= "True Covariogram"), fun = function(dist) theta1*exp(-dist/theta2), color="red") +
   stat_function(aes(colour= "True Covariogram"), fun = function(dist) theta1*exp(-dist/theta2), geom="point", n= 20) +
   labs(y="cov",x="distance", colour="") +
   theme(legend.position=c(0.8, 0.8))


 # Now simulate a spatial random process from the Gaussian model given a 
 #  mean of 500 and the covariance matrix constructed just above:
 y2 = rmvnorm(1, matrix(500,ntot2,1),Sig2) # takes some time
 dim(y2)     #  1 x 900  # Dimensions of the Gaussian process output is 1 x 900
#> [1]   1 900

 # Convert to stars
 #
 m = matrix(y2,length(xg2),length(yg2)) # NOTE I AM USING y to compare with image output
 m = m[, dim(m)[2]:1]       # reorder the columns to have the same result as image
 dim(m) = c(x = nx, y = ny) # named dimensions

 y2_st = st_as_stars(m, 
                     st_dimensions(x = xg2, y = yg2, cell_midpoints = T))
 sf::st_crs(y2_st) <- 3637

 # plot
 #define palette, colors, breaks
 myPalette <- grDevices::colorRampPalette(rev(rainbow(100,start=0,end=.7)))
 the_min = min(y2)
 the_max = max(y2)
 the_breaks = seq(from = the_min, to = the_max, length.out=6)
 numberofbreaks = length(the_breaks)
 y2_st_srp <- scale_fill_gradientn(colours = myPalette(100), limits=c(the_min, the_max),
                                   breaks=the_breaks, labels=round(the_breaks,1))

 ggplot()+
   geom_stars(data=sf::st_crop(y2_st, ohio)) +
   y2_st_srp +
   geom_sf(data=sf::st_as_sf(grd), fill=NA, color="blue") +
   geom_sf(data= ohio, fill=NA, lwd=1) +
   geom_sf(data=sf::st_as_sf(sf::st_crop(grd, ohio), as_points=T))


 distsample2=as.matrix(dist(dat.locs2))
#> Error in as.matrix(x): object 'dat.locs2' not found
 dim(distsample2)
#> Error in eval(expr, envir, enclos): object 'distsample2' not found
 maxd2 = max(distsample2)/2
#> Error in eval(expr, envir, enclos): object 'distsample2' not found

 #Use a subset of the simulated process as working data for geostatistics

 #With Stars create a plot the points for prediction
 idxkeep2=sort(sample(1:ntot2,round(0.4*ntot2))) # 40% random sampling from a 1 to 900
 ymask2=matrix(0,ntot2,1)
 ymask2[idxkeep2,]=1
 mpred = matrix(ymask2,length(xg2),length(yg2))
 mpred = mpred[, dim(mpred)[2]:1]       # reorder the columns to have the same result as image
 dim(mpred) = c(x = nx, y = ny) # named dimensions

 mpred_st = st_as_stars(mpred, 
                        st_dimensions(x = xg2, y = yg2, cell_midpoints = T)) # all cells 1 and 0
 sf::st_crs(mpred_st) <- 3637

 mvar_sf = sf::st_as_sf(mpred_st, as_points=T) 
 mvar_sf = mvar_sf[mvar_sf$A1 == 1, ]  # to estimate and fit a variogram

 #Estimate the empirical semi-variogram

 # subset the data

 n2=length(idxkeep2)  # number of ones
 dat.locs2= sf::st_coordinates(mvar_sf)

 mvar_st =  aggregate(y2_st, mvar_sf, max)  # to get the variable value
 mvar_sf = sf::st_as_sf(mvar_st, as_points=T)  # instead of 1, now have the variable value

 ggplot()+
   #geom_stars(data=mvar_st) +  # not working
   geom_sf(data=sf::st_as_sf(grd), fill=NA, color="blue") +
   geom_sf(data= ohio, fill=NA, lwd=1) +
   geom_sf(data=mvar_sf)


 # variogram with gstat
 y.v2 = gstat::variogram(A1~1, loc=mvar_sf) # cutoff=2.5; note that distances has changed
 mean(mvar_sf$A1) # estimated trend (mean) of the sample data (the truth trend was 5)
#> [1] 553.7393

 # The true semivariogram can be calculated as
 vt2 = theta1-theta1*exp(-y.v2$dist/theta2)

 # We can plot the empirical semi-variogram and true semi-variogram together. 
 xmax2 = max(y.v2$dist,maxd2) # xmax to set xlim for plot
#> Error in eval(expr, envir, enclos): object 'maxd2' not found
 ymax2 = max(y.v2$gamma, vt2) # ymax to set ylim for plot
 plot(y.v2$dist, y.v2$gamma, xlim = c(0,xmax2), ylim = c(0,ymax2), xlab="Distance", ylab=expression(gamma(h)))
#> Error in plot.default(y.v2$dist, y.v2$gamma, xlim = c(0, xmax2), ylim = c(0, : object 'xmax2' not found
 lines(y.v2$dist, vt2, type="b", pch=20, xlim = c(0,xmax2), ylim = c(0,ymax2)) # add true semivariogram
#> Error in plot.xy(xy.coords(x, y), type = type, ...): object 'xmax2' not found

 # Fit an exponential semi-variogram model

 y.v.wls2 = gstat::fit.variogram(y.v2, vgm(45000, "Exp", 100000, 0), fit.method = 1) # fit.method = 1 wls by default

 par(xaxs = "i", yaxs = "i")
 plot(gamma ~ dist, y.v2,      # empirical semivariogram
      xlim = c(0, 1.075 * max(y.v2$dist)), ylim = c(0, 1.05 * max(y.v2$gamma)),
      xlab = "distance h [m]", ylab = expression(gamma(h)))
 lines(variogramLine(y.v.wls2, 1.075 * max(y.v2$dist)), lty = 1, col = 'blue') #Fitted semivariogram
 lines(y.v$u,vt,type="b",pch=20) # true semivariogram
#> Error in lines(y.v$u, vt, type = "b", pch = 20): object 'y.v' not found
 text(y.v2$dist, y.v2$gamma, y.v2$np, pos = 4)


 # Ordinary kriging
 y.krig.ok2 = gstat::krige(A1~1, loc=mvar_sf, grd, y.v.wls2) 
#> [using ordinary kriging]

 ggplot()+
   geom_stars(data = y.krig.ok2, aes(fill = var1.pred)) +
   y2_st_srp +
   geom_sf(data=sf::st_as_sf(grd), fill=NA, color="blue") +
   geom_sf(data= ohio, fill=NA, lwd=1)

Created on 2023-03-08 with reprex v2.0.2

mgimond commented 1 year ago

The problem seems to be limited to a certain type of stars geometries. For example, the following outputs the correct raster extent when performing an IDW interpolation.

library(sf)
library(stars)
library(gstat)

point   <- st_as_sf(st_as_sfc(  "MULTIPOINT ((20 20), (20 80), (80 80), (80 20))" ))
point <- st_cast(point, "POINT") 
point$z <- c(1,2,3,2)

extent  <- st_as_sfc(  "POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0))" )
extent <- st_as_sf(extent)
r      <- st_rasterize(extent, nx = 100, ny = 100)

out <- idw(z ~ 1, point, newdata = r)

The output looks like this:

image(out, axes = TRUE)
plot(point, pch = 16, col = "red", add = TRUE)
grid()
idw1

The original raster overlaps properly, as expected. (It's shown in a semi-transparent blue raster in the following image).

image(r, add = TRUE, col=rgb(0,0,1,0.3))
idw1b

The extents match, as expected.

st_bbox(out)
xmin ymin xmax ymax 
   0    0  100  100 
st_bbox(r)
xmin ymin xmax ymax 
   0    0  100  100 

Now, a change in a couple of corners of the raster's extent forces a shift in the interpolated raster's y-axis

#' The following geometry generates an interpolated raster that is shifted down
extent  <- st_as_sfc(  "POLYGON ((10 20, 15 85, 100 100, 100 0, 10 20))" )
extent <- st_as_sf(extent)
r      <- st_rasterize(extent, nx = 100, ny = 100)

out <- idw(z ~ 1, point, newdata = r)

# Here's the interpolated raster
image(out, axes = TRUE , ylim = c(-20,100))
plot(point, pch = 16, col = "red", add = TRUE)
grid()

#'The output raster is shifted down
image(r, add = TRUE, col=rgb(0,0,1,0.3))
idw2

The extents differ.

st_bbox(out)
  xmin   ymin   xmax   ymax 
 10.00  -9.95 100.00  90.05
st_bbox(r)
xmin ymin xmax ymax 
  10    0  100  100 

I've played with a few other simple geometries, but can't nail down a pattern. For the example, the following extents work:

extent  <- st_as_sfc(  "POLYGON ((10 20, 0 100, 100 100, 100 0, 10 20))" )  #works
extent  <- st_as_sfc(  "POLYGON ((0 0, 10 80, 100 100, 100 0, 0 0))" )  #works

and the following extents fail:

extent <- st_as_sfc(  "POLYGON ((10 20, 20 50, 10 90, 100 100, 80 50, 100 0, 10 20))" ) # fails
extent <- st_as_sfc(  "POLYGON ((20 0, 50 100, 100 0, 20 0))" ) # fails

Versions: gstat = 2.1.1 stars = 0.6.1 sf = 1.0.13

alexyshr commented 5 months ago

@mgimond I'm sorry for not getting back to you sooner. I need to apologize because my code has errors (that can be avoided). Your code is easier and better to explain the issue. Thank you!

edzer commented 5 months ago

Thanks so much for the great reports, and persistence. The virtue of copy-and-paste. This happened when a stars object had non-square cells.