Unexpected behaviour wrt intercepts and W in lmSLX

nk027 commented 2 years ago

Hey, I stumbled upon some arguably unexpected behaviour with the lmSLX() function that exist on version 1.2-1.

It throws an error (probably due to a lagged intercept coefficient) if row-stochastic weights are provided naively.
If the weights are not row-stochastic a spatially lagged intercept is included by default. This is arguably unexpected behaviour -- what do you think?
Also, the impacts of SLX models are not computed in the correct manner (see Issue #23)

I can work on a PR fixing these, but I'll need some time. Below is a quick example.

library("spatialreg")
N <- 100L

# Build connectivity ---
xy <- cbind(runif(N), runif(N)) # Simulate locations
dist <- as.matrix(dist(xy)) # Compute distance
diag(dist) <- Inf # We want zeros on the diagonal
W <- dist^-2 # Compute inverse-distance with decay parameter two
W_rs <- W / rowSums(W) # Use row- and eigenvalue-scaling
W_ev <- W / max(eigen(W, symmetric = TRUE, only.values = TRUE)$values)

# Build DGP ---
X <- matrix(rnorm(N), N)
y <- X * 2 + W_ev %*% X * 1 + rnorm(N, sd = 0.01) # No need for noise

# Estimate using lmSLX() ---
l_rs <- spdep::mat2listw(W_rs)
l_ev <- spdep::mat2listw(W_ev)

lmSLX(y ~ X, listw = l_rs)
# Error in sum_lm_model$coefficients[((m2 + 2):m + LI), 1:2, drop = FALSE] : 
#  subscript out of bounds
lmSLX(y ~ 0 + X, listw = l_rs) # This works
lmSLX(y ~ X, Durbin = ~ 0 + X, listw = l_rs) # This works -- arguably expected
lmSLX(y ~ X, listw = spdep::mat2listw(W_rs, style = "W")) # This works too

lmSLX(y ~ X, listw = l_ev)
# Here, the intercept is lagged too, since W's scaling means it's not just 1
# This may qualifies as unexpected behaviour and can be annoying to avoid
lmSLX(y ~ 0 + X, listw = l_ev) # No intercept
lmSLX(y ~ X, Durbin = ~ 0 + X, listw = l_ev) # Arguably the expected behaviour

nk027 commented 2 years ago

I have been looking into how to address this and I don't think there's a nice way.

Changing the default may break code somewhere.
Leaving it as is makes comparison between row-stochastic and non-row-stochastic matrices a hassle.
Including an argument to allow skipping the lag of the intercept is not very convenient.

rishinsa commented 1 year ago

hi, I have the same issue, have you found the solution?

rsbivand commented 1 year ago

The main problem in this and #23 is the "naive" construction of the listw object. The next problem is omitting the intercept, which is not really covered. The work in creating the WX is connected to the need to drop the lagged intercept if the row sums of W are identical to the unity intercept term. I don't see a great need to accommodate no-intercept formulae, nor to handle the lack of the style= argument in spdep::mat2listw(), which is covered in the reprex. Should spdep::mat2listw() check whether all row sums are unity and if style= is missing, impose "W" and recalculate?

rsbivand commented 1 year ago

I'm adding:

library(spatialreg)
data(oldcol, package="spdep")
lwB <- spdep::nb2listw(COL.nb, style="B")
mod0 <- lmSLX(CRIME ~ INC + HOVAL, data=COL.OLD, listw=lwB)
coef(mod0)
INCB <- spdep::lag.listw(lwB, COL.OLD$INC)
HOVALB <- spdep::lag.listw(lwB, COL.OLD$HOVAL)
INTERCEPTB <- spdep::card(lwB$neighbours)
mod1 <- lm(CRIME ~ INC + HOVAL + INTERCEPTB + INCB + HOVALB, data=COL.OLD)
expect_true(isTRUE(all.equal(unname(coef(mod0)), unname(coef(mod1)))))
expect_warning(modx <- lmSLX(CRIME ~ INC + HOVAL + 0, data=COL.OLD, listw=lwB))
mod1x <- lm(CRIME ~ INC + HOVAL + 0 + INCB + HOVALB, data=COL.OLD)
expect_true(isTRUE(all.equal(unname(coef(modx)), unname(coef(mod1x)))))

as a tinytest. I cannot replicate any of the cases in #23, #24 or #26 (my version above, no made-up neighbours, no mat2listw()), some are because the style component of the listw is set to "M" that is missing by spdep::mat2listw(). I'm adding warnings to that function to try to indicate that not setting the style= argument is unsafe.

r-spatial / spatialreg

Unexpected behaviour wrt intercepts and W in lmSLX #24