r-spatial / spatialreg

spatialreg: spatial models estimation and testing
https://r-spatial.github.io/spatialreg/
45 stars 11 forks source link

Anselin Kelejian Test #56

Open JosiahParry opened 2 days ago

JosiahParry commented 2 days ago

After using S2SLS, it is recommended to check for remaining spatial autocorrelation using the Anselin-Kelejian (AK) Test. When the test statistic is significant it suggests that HAC standard errors should be reported to rather than white standard errors.

Is the AK test available yet in spatialreg?

JosiahParry commented 2 days ago

The test is defined here for pysal: https://github.com/pysal/spreg/blob/3ff33bcadbf70f0e1ad61099f0ea8e47966fa03c/spreg/diagnostics_sp.py#L921-L954

rsbivand commented 1 day ago

@JosiahParry My understanding is that the AK test only applies when there are no spatial endogenous variables on the RHS, so non-spatial IV (Anselin & Rey p. 142):

... for the special case where only non-spatial endoogenous variables are included in the model specification (i.e., specifically excluding a spatially lagged dependent variable on the RHS), ...

There are no non-spatial endogenous varables in s2sls, so that AK test is not relevant.

On p. 166-7, the AK_L test looks more like the actual implementation (no equation numbers there), and looks similar to Anselin & Kelejian (1997) p. 163, eq. 11-12. The next question would be whether this might not be better placed in sphet compared to here, since Luc's implementation is almost twenty years old. @gpiras what do you think?

gpiras commented 1 day ago

For what I remember (but I read Anselin and Kelejian a few years ago) the specification includes both a spatial lag and additional endogenous variable. If I recall correctly, they include the separate specification as special cases. As for the suggestion of including the test in sphet I am not really sure it would be a good idea. The user can use the spatial HAC to produce robust standard errors by default, and if they believe that the error follows a spatial autoregressive process specify a SARAR model.

rsbivand commented 1 day ago

@JosiahParry Could you provide a worked example from pysal/spreg for example through reticulate? I'm having trouble with not using q, the example should use wlags=2. It should be possible to calculate AK for special case 1 in s2sls and report it in the summary method.

JosiahParry commented 1 day ago

@rsbivand here you go!

library(reticulate)

# reticulate::virtualenv_create("spatial", packages = c("numpy", "spreg", "libpysal"))
use_virtualenv("spatial")

lp <- import("libpysal")
np <- import("numpy")
spreg <- import("spreg")

columbus <- lp$examples$get_path("columbus.dbf")
columbus_sf <- sf::st_read(columbus)

# oddly passing in `columbus` here doesn't work! 
w <- lp$weights$contiguity$Rook$from_shapefile(lp$examples$get_path("columbus.shp"))
w$transform = "r"

reg <- spreg$GM_Lag(np$array(columbus_sf$HOVAL), np$array(cbind(columbus_sf$INC, columbus_sf$CRIME)), w = w)

reg$ak_test
#> [[1]]
#> [1] 2.075273
#> 
#> [[2]]
#> [1] 0.1497032
rsbivand commented 13 hours ago

reticulate couldn't find my local updated python packages. I have:

import numpy as np
import libpysal
from spreg import GM_Lag, AKtest
db = libpysal.io.open(libpysal.examples.get_path("columbus.dbf"),'r')
y = np.array(db.by_col("CRIME"))
y = np.reshape(y, (49,1))
X = []
X.append(db.by_col("INC"))
X.append(db.by_col("HOVAL"))
X = np.array(X).T
w = libpysal.weights.Rook.from_shapefile(libpysal.examples.get_path("columb\
us.shp"))
w.transform = 'r'
reg_lag = GM_Lag(y, X, w=w)
ak_sp = AKtest(reg_lag, w, case='gen')
print('AK test: %f\tP-value: %f'%(ak_sp.ak, ak_sp.p))
# AK test: 0.147798 P-value: 0.700648

The case='gen' seems to matter; https://github.com/pysal/spreg/blob/3ff33bcadbf70f0e1ad61099f0ea8e47966fa03c/spreg/diagnostics_sp.py#L303-L307.

JosiahParry commented 9 hours ago

Ah, sorry about the python environment issue :/ classic problem.

Good eye! So it seems there is an AK test for the SLM and SARAR scenarios respectively. I was unaware!

rsbivand commented 7 hours ago

Well, I think that 'nosp' is for the case where no \rho y is included but some regressors are endogenous and instrumented, and 'gen' is for the case when \rho y is present and some regressors are endogenous and instrumented. How either perform in our setting where no regressors are endogenous and instrumented is a bit open - s2sls does not admit endogenous regressors. Also:

reg_lag1 = GM_Lag(y, X, w=w, w_lags=1, sig2n_k=True)
reg_lag1.betas.T
# array([[45.94746499, -1.05273965, -0.25984195,  0.4104519 ]])
se_betas(reg_lag1)
# array([11.51533771,  0.39427781,  0.09253382,  0.19386571])

matches:

library(spatialreg)
col <- st_read("/home/rsb/.local/lib/python3.13/site-packages/libpysal/examples/columbus/columbus.shp")
nb <- spdep::poly2nb(col, queen=FALSE)
lw <- spdep::nb2listw(nb)
lag.stsls1 <- stsls(CRIME ~ INC + HOVAL, data=col, lw, W2X=FALSE)
coef(lag.stsls1)
#        Rho (Intercept)         INC       HOVAL 
#  0.4104519  45.9474650  -1.0527396  -0.2598419 
sqrt(diag(lag.stsls1$var))
# [1]  0.19386571 11.51533771  0.39427781  0.09253382

suggesting that stsls needs a sig2n_k argument - it is default False in GM_Lag.