Error in dimnamesGets(x, value) : invalid dimnames given for “dgRMatrix” object - Githubissues

r-spatial / spatialreg

spatialreg: spatial models estimation and testing

https://r-spatial.github.io/spatialreg/

41 stars 12 forks source link

Error in dimnamesGets(x, value) : invalid dimnames given for “dgRMatrix” object #33

Closed GokuuQ closed 1 year ago

GokuuQ commented 1 year ago

Hi, I have this error in my project:

Error in dimnamesGets(x, value) : 
  invalid dimnames given for “dgRMatrix” object

I used some of the code from Spatial weights objects as sparse matrices and graphs first to get data and neighbor matrix:

library(spatialreg)
library(sf)
columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1])
library(spdep)
nb_q <- poly2nb(columbus)
nb_q

attr(nb_q, "region.id")[1:10]
col2 <- droplinks(nb_q, 21)
View(col2)

nb_B <- nb2listw(col2, style="B", zero.policy=TRUE)

After this, it will be my code for reproducing the error.

First, I want to make sure it works with original data, so I extract some of them and run the lag model, it works.

columbus_extract <- columbus[,7:18] %>% st_set_geometry(NULL)

test <- lagsarlm(HOVAL~., data=columbus_extract, listw = nb_B, zero.policy = TRUE)
summary(test)

Next, I did a similar processing to what I have done in my project. I separated the original data by 2 time variables, which are season and time periods of one day(like 0-3am, 3-6am...) and make the the number of lines has become 32(4 seasons * 8 periods) times of the original. Since there is no time-related data in columbus, I just simply copied the columbus_extract data 32 times, just for keeping the same structure.

columbus_new <- data.frame(matrix(data=NA,nrow = 1568,ncol = 12))

for(i in 1:12){
  columbus_new[,i] <- rep(columbus_extract[,i],32)
}
colnames(columbus_new) <- colnames(columbus_extract)
View(columbus_new)

For the neighbor matrix, I use this code to copy. Now the neighbor matrix is like a combined 32 separated neighbor matrix of the original one.

col2_new <- col2
for(k in 2:32) {
  for(l in 1:49) {
    a <- col2_new[l]
    b <- lapply(a,function(x) x + (k-1L)*49)
    col2_new[[l+(k-1)*49]] <- b[[1]]
  }
}
View(col2_new)

for (i in 1:length(col2_new)){
  col2_new[[i]] <- as.integer(col2_new[[i]])
}

nb_B_new <- nb2listw(col2_new, style="W", zero.policy=TRUE)
View(nb_B_new)

Then when I run the lag model, whether I have added method="LU" or not, I got an error. And in the errorsarlm model I got the same.

test2 <- lagsarlm(HOVAL~., data=columbus_new, listw = nb_B_new, zero.policy = TRUE)

test3 <- lagsarlm(HOVAL~., data=columbus_new, listw = nb_B_new, zero.policy = TRUE,
                  method = "LU")

Error:

Error in dimnamesGets(x, value) : 
  invalid dimnames given for “dgRMatrix” object

But in the SLX model, it's working fine.

test4 <- lmSLX(HOVAL~., data=columbus_new, listw = nb_B_new, zero.policy = TRUE)
summary(test4)

In another case, when I try to assign the new neighbor matrix to "CsparseMatrix" as you have shown in Spatial weights objects as sparse matrices and graphs, I got the same error.

# another case
B <- as(nb_B_new, "CsparseMatrix")

In my case, is there any way to solve this error? And will you recommend using spatial regression in this way？ Thanks!

rsbivand commented 1 year ago

Please provide version details and a reproducible example.

GokuuQ commented 1 year ago

Hi, R version 4.1.3 (2022-03-10) packageVersion("spatialreg"): ‘1.2.3’

I think it's a reproducible case since I used part of your code to get the built-in dataset "columbus" and reshape it to reproduce the error. It might be helpful if you could run the data reshaping part (the part of getting "columbus_new" and "nb_B_new"). By viewing the new reshaped data and listw, you may know what's the problem there.

Thanks!

rsbivand commented 1 year ago

Just a single, minimal code block, not spread out in a message.

GokuuQ commented 1 year ago

This is the code for reproduction

library(spatialreg)
library(sf)
columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1])
library(spdep)
nb_q <- poly2nb(columbus)
col2 <- droplinks(nb_q, 21)
nb_B <- nb2listw(col2, style="B", zero.policy=TRUE)
# the part above is from yours

columbus_extract <- columbus[,7:18] %>% st_set_geometry(NULL)
columbus_new <- data.frame(matrix(data=NA,nrow = 1568,ncol = 12))

for(i in 1:12){
  columbus_new[,i] <- rep(columbus_extract[,i],32)
}
colnames(columbus_new) <- colnames(columbus_extract)
View(columbus_new)

col2_new <- col2
for(k in 2:32) {
  for(l in 1:49) {
    a <- col2_new[l]
    b <- lapply(a,function(x) x + (k-1L)*49)
    col2_new[[l+(k-1)*49]] <- b[[1]]
  }
}
View(col2_new)

for (i in 1:length(col2_new)){
  col2_new[[i]] <- as.integer(col2_new[[i]])
}

nb_B_new <- nb2listw(col2_new, style="W", zero.policy=TRUE)

# error here
test2 <- lagsarlm(HOVAL~., data=columbus_new, listw = nb_B_new, zero.policy = TRUE)

test3 <- lagsarlm(HOVAL~., data=columbus_new, listw = nb_B_new, zero.policy = TRUE)

# another case I got error
B <- as(nb_B_new, "CsparseMatrix")

rsbivand commented 1 year ago

It is impossible to understand what you are doing. First you copy blocks of data to create multiples. Then you do something very odd to the weights matrix, after dropping a link. If your setting is panel data with 32 periods, convert nb_B to CsparseMatrix first, and use kronecker() to generate a weights matrix as in splm, which is for spatial panels.

GokuuQ commented 1 year ago

Thank you for your suggestion. In our case, the dependent variable has different observations at time periods. But most parts of the independent variables are fixed data repeated 32 times, and the other part is dummy variables composed of seasons and time periods. Do you think our data can be considered as spatial panel data?

rsbivand commented 1 year ago

Yes, but in any case inferring from the fixed effects in terms of the variables will be pretty meaningless, unless the number of unique tuples is close to NxT. You should be able to try splm and compare with kronecker() weights and model fitting functions in spatialreg and sphet.

GokuuQ commented 1 year ago

Thank you again!

In my case, since the cell number/individuals are around 4000-5000, I got an error that the weight matrix generated by kronecker() is more than 100GB, which can not be processed. Do you think is there any way to solve it, or do you think the spatial regression will not work for a case like this (with too many cells)?
When I use spml function from splm package, I got the error Error in lag.listw(listw2, u) : Variable contains non-finite values. I found similar issue here https://stat.ethz.ch/pipermail/r-sig-geo/2019-June/027399.html I tried spatialreg::set.ZeroPolicyOption(TRUE) and I'm sure I set zero.policy=TRUE in both listw and spml but still get this error. But sorry I didn't manage to make a reproducible example with built-in data. And here is the result of traceback():
```
7: lag.listw(listw2, u)
6: FUN(X[[i]], ...)
5: lapply(X = ans[index], FUN = FUN, ...)
4: tapply(wyt, inde, function(u) lag.listw(listw2, u), simplify = TRUE)
3: unlist(tapply(wyt, inde, function(u) lag.listw(listw2, u), simplify = TRUE))
2: spfeml(formula = formula, data = data, index = index, listw = listw, 
   listw2 = listw2, na.action, model = model, effects = effects, 
   cl = cl, ...)
1: spml(gbg_temporal_eq2, data = gbg_pdata2, model = "within", lag = T, 
   spatial.error = "b", listw = gbg_nbw_100_ex, zero.policy = TRUE)
```
Do you have any idea about what's going wrong here or anywhere I should check more?

rsbivand commented 1 year ago

Please provide a reproducible example. There is no reason for sparse matrices to go dense unless you choose an approach which causes this. Tracebacks are no use unless they can be reproduced under control and debugging. No help without a reproducible example. Further the splm repository is not here, not on github but https://r-forge.r-project.org/projects/splm/ @gpiras might comment here, but without a fully reproducible example, nobody has any idea what mistakes are being made in your use of the software.

gpiras commented 1 year ago

I totally agree with @rsbivand, you should provide a reproducible example and use the proper channels related to splm

rsbivand commented 1 year ago

The problem is caused by attempts to construct weights objects by hand, ignoring important aspects of the object structures (the "region.id" attribute), and ignoring tools provided in the packages:

> library(spatialreg)
Loading required package: spData
Loading required package: Matrix
Loading required package: sf
Linking to GEOS 3.11.0, GDAL 3.5.2, PROJ 9.1.0; sf_use_s2() is TRUE
> library(sf)
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1])
Reading layer `columbus' from data source 
  `/home/rsb/lib/r_libs/spData/shapes/columbus.shp' using driver `ESRI Shapefile'
Simple feature collection with 49 features and 20 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 5.874907 ymin: 10.78863 xmax: 11.28742 ymax: 14.74245
CRS:           NA
> library(spdep)
Loading required package: sp

Attaching package: 'spdep'

The following objects are masked from 'package:spatialreg':

    get.ClusterOption, get.coresOption, get.mcOption,
    get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
    set.coresOption, set.mcOption, set.VerboseOption,
    set.ZeroPolicyOption

> nb_q <- poly2nb(columbus)
> col2 <- droplinks(nb_q, 21)
> nb_B <- nb2listw(col2, style="B", zero.policy=TRUE)
> # the part above is from yours
> columbus_extract <- columbus[,7:18] %>% st_set_geometry(NULL)
> columbus_new <- data.frame(matrix(data=NA,nrow = 1568,ncol = 12))
> 
> for(i in 1:12){
+   columbus_new[,i] <- rep(columbus_extract[,i],32)
+ }
> colnames(columbus_new) <- colnames(columbus_extract)
> str(columbus_new)
'data.frame':   1568 obs. of  12 variables:
 $ HOVAL : num  80.5 44.6 26.4 33.2 23.2 ...
 $ INC   : num  19.53 21.23 15.96 4.48 11.25 ...
 $ CRIME : num  15.7 18.8 30.6 32.4 50.7 ...
 $ OPEN  : num  2.851 5.297 4.535 0.394 0.406 ...
 $ PLUMB : num  0.217 0.321 0.374 1.187 0.625 ...
 $ DISCBD: num  5.03 4.27 3.89 3.7 2.83 3.78 2.74 2.89 3.17 4.33 ...
 $ X     : num  38.8 35.6 39.8 36.5 40 ...
 $ Y     : num  44.1 42.4 41.2 40.5 38 ...
 $ NSA   : num  1 1 1 1 1 1 1 1 1 1 ...
 $ NSB   : num  1 1 1 1 1 1 1 1 1 1 ...
 $ EW    : num  1 0 1 0 1 1 0 0 1 1 ...
 $ CP    : num  0 0 0 0 0 0 0 0 0 0 ...
> B <- as(nb_B, "CsparseMatrix")
> str(B)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:230] 1 2 0 2 3 0 1 3 4 1 ...
  ..@ p       : int [1:50] 0 2 5 9 13 21 23 27 33 41 ...
  ..@ Dim     : int [1:2] 49 49
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:49] "1" "2" "3" "4" ...
  .. ..$ : chr [1:49] "1" "2" "3" "4" ...
  ..@ x       : num [1:230] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ factors : list()
> kB <- kronecker(Diagonal(32), B) 
> str(kB)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:7360] 1 2 0 2 3 0 1 3 4 1 ...
  ..@ j       : int [1:7360] 0 0 1 1 1 2 2 2 2 3 ...
  ..@ Dim     : int [1:2] 1568 1568
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ x       : num [1:7360] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ factors : list()
> test2 <- lagsarlm(HOVAL~., data=columbus_new, listw = mat2listw(as.matrix(kB), style="W"), zero.policy = TRUE)
Warning message:
In nb2listw(res$neighbours, glist = res$weights, style = style,  :
  zero sum general weights
> test2

Call:
lagsarlm(formula = HOVAL ~ ., data = columbus_new, listw = mat2listw(as.matrix(kB), 
    style = "W"), zero.policy = TRUE)
Type: lag 

Coefficients:
          rho   (Intercept)           INC         CRIME          OPEN 
  0.020979270   4.039383045   0.241682915  -0.447899751   0.547255838 
        PLUMB        DISCBD             X             Y           NSA 
  1.709137975   4.328946514   0.001492414   0.772482356 -22.023246743 
          NSB            EW            CP 
 22.516528536   2.975388464   0.877295186 

Log likelihood: -6169.764

The development version of spdep is needed to avoid as.matrix(kB) as current Matrix caused some issues.

> test2 <- lagsarlm(HOVAL~., data=columbus_new, listw = mat2listw(kB, style="W"), zero.policy = TRUE)
Warning messages:
1: In sn2listw(df) :
  21, 70, 119, 168, 217, 266, 315, 364, 413, 462, 511, 560, 609, 658, 707, 756, 805, 854, 903, 952, 1001, 1050, 1099, 1148, 1197, 1246, 1295, 1344, 1393, 1442, 1491, 1540 are not origins
2: In nb2listw(res$neighbours, glist = res$weights, style = style,  :
  zero sum general weights

For splm, the weights object is given once and kronecker() is handled internally if needed.

The manual route may have cycled the time and observation dimensions in the wrong order above.