r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
122 stars 26 forks source link

Errors in trying to get neighbors of neigbors #102

Closed xu003822 closed 1 year ago

xu003822 commented 2 years ago

Hi everyone,

I am trying to use the characteristics of neighbors of neighbors of New Jersey's municipalities as instrumental variables. I am using spatial durbin models. But I encounter some errors when some municipalities have no neighbors of neighbors.

One way I am getting the neighbors is using poly2nb function: geo.nb <- poly2nb(geo_2019); where geo_2019 is my dataset; And if there is an island, I am using k-nearest neighbors with k = 2;

I am able to create a .nb list for neighbors of neighbors. But some municipalities have no neighbors of neighbors; for example, municipality 26 has neighbor of 86 only; and 86's neighbor is 26, then both these municipalities have no neighbors of neighbors. So my thought is when I run regressions using r-command, "lagsarlm", the regression will treat municipalities without any neighbors of neighbors as missing values, and ignores it. So I use "nb2listw" command with zero.policy = TRUE first, and then use "lagsarlm" command with zero.poly = TRUE later, but when I rn the command "lagsarlm", one error appeared:

"Error in eigen(similar.listw_Matrix(get("listw", envir = env)), only.values = TRUE) : infinite or missing values in 'x'"

It might have something to do with the observations with no neighbors of neighbors. Any help would be greatly appreciated!!!!

rsbivand commented 2 years ago

Please provide the code and data to reproduce the problem, there is too little here to be able to find out where it occurs.

rsbivand commented 2 years ago

The error message is from spatialreg/R/jacobian.R, line 124. Please provide a listw object that provokes the error.

xu003822 commented 2 years ago

Sorry for the late reply. And thank you for your response. Here is my full code and data:

#data is at https://github.com/xu003822/Community-peer-effect/blob/main/geo_control_certi.RData
library(sf)
library(spdep)
library(ggplot2)
library(rgdal)
library(sp)
library(spatialreg)
library(readxl)
library(data.table)
library(ggplot2)
library(splm)
library(mapview)
library(reshape)

library(tidyverse)

load("geo_control_certi.RData")
geo_2019 <- filter(geo_control_certi, year == 2019)

#create a function that using poly2nb to create neighbors and using nearest neigbor to create neigbor for #islands
addnbs <- function(geo_2019){

  geo.nb <- poly2nb(geo_2019)

  count = card(geo.nb)
  if(!any(count==0 | count==1)){
    return(geo.nb)
  }

  ## get nearest neighbour index, use centroids:
  #getting the nearest neigbor for each municipality#return two nearest neigbor
  nnbs = knearneigh(coordinates(geo_2019), k=2)$nn
  #nnbs is a matrix, cite the number in it using nnbs[i,j]
  #get the index for the municipality that doesn't have a neigbor
  # give k nearest neigbor to the muni that have either 0 or 1 neigbor
  no_edges_from = which(count==0|count==1)
  for(i in no_edges_from){
    for (j in 1:2){
      geo.nb[[i]][j] = nnbs[i,j]
    }
  }
  return(geo.nb)
}

n2 = addnbs(geo_2019)
instru.nei = n2
#------getting neighbor of neighbor 

for(i in 1:length(n2)){  
  k=1
  for(j in 1:length(n2[[i]])){#e.g., n2[[n2[[i]][j]]] = n2[[9]] # x is in (1 38 228 297)# x then is in (1 9 199 297)
     for(xx in n2[[n2[[i]][j]]]) {
        if (!(xx%in%n2[[i]])&(xx!=i)&(!(xx%in%instru.nei[[i]]))){
         instru.nei[[i]][k] = xx
         k=k+1
           }
      }  
  }

    instru.nei[[i]] <- instru.nei[[i]][1:k-1]
}
}

index_zero <- c() # this list will be used to indicate municipality which has no neigbor of neigbor
for (i in 1:324) {
if(length(instru_nofn[[i]])==0)
  index_zero <- append(index_zero, i)
}

#give 0 to the neigobr of neigbor to muni that has no neibor of neibor
for (i in index_zero) {
  instru_nofn[[i]] = as.integer(0)
}

queen.non <- nb2listw(instru_nofn, zero.policy=TRUE)

#--------spatial durbin model (SAR) using 
sar.durbin = lagsarlm(Certified ~ Median_Age + Percent_Children + Percent_Bachelor + Pop_new + Percent_sixty + White + Black + Asian + Hispanic +
                        Percent_Unemployed + Mean_trave_time + Percent_Manufacture + Median_income + Percent_poverty, 
                       data = geo_2019, listw = queen.non, type="mixed", zero.policy=TRUE)

summary(sar.durbin)
rsbivand commented 2 years ago

No, a minimal example using few extraneous packages. Reduce this to as few lines as possible. Do not build the weights object by hand. Use for example spdep::union.nb(), do not hand-modify nb objects unless you want to accept responsibility for the outcome. Neighbours of neighbours by function spdep::nblag(), not by hand.

xu003822 commented 2 years ago

I really appreciate it. Problems solved. Using spde::nblag(); no errors arise. Thanks a lot!

xu003822 commented 2 years ago

By the way, when using lagsarlm() and zero.policy=TRUE, are the geo units that do not have any neighbors treated as missing values?

rsbivand commented 2 years ago

Yes, because the lagged value of an observation with no neighbours is not defined: https://doi.org/10.1007/978-3-662-05617-2_6, recently https://r-spatial.org/book/14-Areal.html.

> library(spdep)
Loading required package: sp
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.11.0, GDAL 3.5.2, PROJ 9.1.0; sf_use_s2() is TRUE
> data(columbus)
> lw <- nb2listw(col.gal.nb)
> lw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49 
Number of nonzero links: 230 
Percentage nonzero weights: 9.579342 
Average number of links: 4.693878 

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 49 2401 49 23.48489 204.6687
> n.comp.nb(col.gal.nb)
$nc
[1] 1

$comp.id
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1 1 1 1

> x <- columbus$INC
> summary(lag.listw(lw, x))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.539  10.957  14.303  14.742  17.161  26.066 
> nb_1 <- droplinks(col.gal.nb, 1L)
> n.comp.nb(nb_1)
$nc
[1] 2

$comp.id
 [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2
> lw_1 <- nb2listw(nb_1, zero.policy=TRUE)
> print(lw_1, zero.policy=TRUE)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49 
Number of nonzero links: 226 
Percentage nonzero weights: 9.412745 
Average number of links: 4.612245 
1 region with no links:
1005

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 48 2304 48 22.96703 200.7546
> summary(lag.listw(lw_1, x))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  8.539  10.819  14.139  14.559  16.923  26.066       1 
Warning message:
In lag.listw(lw_1, x) : NAs in lagged values
xu003822 commented 2 years ago

Thank you so much. It makes perfect senses.

xu003822 commented 1 year ago

I am trying to run instrument variable regression using 2SLS, using a geo unit's neighbors of neighbor as instruments for its neighbors;

The first stage I am running a regression regressing a geo unit's neigbor's outcome on its neigbor's neigbors' outcomes (this excludes the geo unit as the neigbor's neighbor); Then in the second stage, I will regress the unit's outcome on the predicted outcome from the first stage.

Mathematically, the first stage is $y_j = \beta_0 + \beta_1 y_k + \beta_3 X +\epsilon$; The second stage is $y_i = \beta_0 + \beta_1 y_j + \beta_3 X +\epsilon$. I want $y_k$ to not include $y_i$. However in r, if I use lagsarlm, I have a difficult time running the neigobr's outcome on its neigbor's neigbors' outcomes which excludes the geo unit as the neigbor's neighbor. Any help would be appreciated!

rsbivand commented 1 year ago

Do not use spatialreg::lagsarlm(). With instrumental variables, use sphet::spreg().

xu003822 commented 1 year ago

I appreciate it. Thank you! I just realized that my dependent variable is a discrete choice. Do spreg() or lagsarlm() also apply to limited dependent variables?

On Sun, Oct 2, 2022 at 1:32 PM Roger Bivand @.***> wrote:

Do not use spatialreg::lagsarlm(). With instrumental variables, use sphet::spreg().

— Reply to this email directly, view it on GitHub https://github.com/r-spatial/spdep/issues/102#issuecomment-1264694559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM6GIQOTRPPUZOFQV5BJHULWBHBKJANCNFSM6AAAAAAQTGH4JM . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Chenyang Xu, Ph.D. Assistant Professor of Economics Dept. of Economics University of Windsor 226-340-9757 http://charliexu.weebly.com

rsbivand commented 1 year ago

No. Consider https://cran.r-project.org/package=spldv. Maybe also https://cran.r-project.org/package=spmoran, but I don't think it admits instruments.