r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
116 stars 26 forks source link

Error in vignettes code: `Unknown WKB type (648)! Full WKB type number was (50331648)` #154

Open barracuda156 opened 4 weeks ago

barracuda156 commented 4 weeks ago

I get an error in vignettes here:

--->  Testing R-spdep
Executing:  cd "/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-spdep/R-spdep/work/spdep" && /opt/local/bin/R CMD check ./spdep_1.3-4.tar.gz --no-manual --no-build-vignettes 
* using log directory ‘/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-spdep/R-spdep/work/spdep/spdep.Rcheck’
* using R version 4.4.0 (2024-04-24)
* using platform: powerpc-apple-darwin10.0.0d2 (32-bit)
* R was compiled by
    gcc-mp-13 (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0
    GNU Fortran (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0
* running under: OS X Snow Leopard 10.6
* using session charset: UTF-8
* using options ‘--no-manual --no-build-vignettes’
* checking for file ‘spdep/DESCRIPTION’ ... OK
* this is package ‘spdep’ version ‘1.3-4’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... NOTE
Package suggested but not available for checking: ‘tmap’
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘spdep’ can be installed ... OK
* used C compiler: ‘gcc-mp-13 (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0’
* used SDK: ‘NA’‘NA’‘NA’‘NA’‘NA’‘NA’
* checking installed package size ... NOTE
  installed size is  8.1Mb
  sub-directories of 1Mb or more:
    doc   5.3Mb
    etc   1.4Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking code files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking whether startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking compiled code ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘tinytest.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes ... OK
* checking running R code from vignettes ...
  ‘CO69.Rmd’ using ‘UTF-8’... OK
  ‘nb.Rmd’ using ‘UTF-8’... OK
  ‘nb_sf.Rmd’ using ‘UTF-8’... OK
  ‘sids.Rmd’ using ‘UTF-8’... failed
 ERROR
Errors in running code in vignettes:
when running code in ‘sids.Rmd’
  ...
Spherical geometry (s2) switched off

> plot(st_geometry(nc), axes = TRUE)

> text(st_coordinates(st_centroid(st_geometry(nc), of_largest_polygon = TRUE)), 
+     label = nc$FIPSNO, cex = 0.5)

  When sourcing ‘sids.R’:
Error: Unknown WKB type (648)! Full WKB type number was (50331648).
Execution halted

* checking re-building of vignette outputs ... SKIPPED
* DONE

Status: 1 ERROR, 2 NOTEs

Any idea what goes wrong?

rsbivand commented 3 weeks ago

This looks like https://github.com/r-spatial/spdep/blob/main/vignettes%2Fsids.Rmd#L51. Your platform appears to be rather outdated, so the underlying problem is not connected to this package. Can sf read this or any other external file? What versions of R, sf and GDAL are installed?

rsbivand commented 3 weeks ago

Please run and report all output:

library(sf)
packageVersion("sf")
sf_extSoftVersion()
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
st_crs(nc) <- "+proj=longlat +datum=NAD27"
sf_use_s2(FALSE)
a <- st_geometry(nc)
plot(a, axes=TRUE)
b <- st_centroid(a, of_largest_polygon=TRUE)
c <- st_coordinates(b)
text(c, label=nc$FIPSNO, cex=0.5)

If nc cannot be read, the problem may be in the GDAL version used with sf. If the plot of a works, then GDAL is OK for 32-bit and big-endian data. If the failure is at the creation of b, it would suggest that GEOS is unhappy with 32-bit and/or big-endian data.

barracuda156 commented 3 weeks ago

@rsbivand Thank you very much. Here is what I get:

36-231% /opt/local/bin/R

R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.0.0d2 (32-bit)

> library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE
> packageVersion("sf")
[1] ‘1.0.16’
> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
      "3.12.1"        "3.8.5"        "9.4.0"         "true"         "true" 
          PROJ 
       "9.4.0" 
> nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
Reading layer `sids' from data source 
  `/opt/local/Library/Frameworks/R.framework/Versions/4.4/Resources/library/spData/shapes/sids.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
CRS:           NA
> st_crs(nc) <- "+proj=longlat +datum=NAD27"
> sf_use_s2(FALSE)
Spherical geometry (s2) switched off
> a <- st_geometry(nc)
> plot(a, axes=TRUE)
> b <- st_centroid(a, of_largest_polygon=TRUE)
Error in CPL_geodetic_area(st_geometry(x), p$SemiMajor, p$InvFlattening) : 
  Unknown WKB type (648)! Full WKB type number was (50331648).
> c <- st_coordinates(b)
Error: object 'b' not found
> text(c, label=nc$FIPSNO, cex=0.5)
Error in as.double(y) : 
  cannot coerce type 'builtin' to vector of type 'double'

Plot generation seemed to have worked fine. plot

Should I open an issue with GEOS upstream then?

rsbivand commented 3 weeks ago

Thanks! Could you try:

b <- st_centroid(a, of_largest_polygon=FALSE)

If that fails, next:

st_crs(a) <- NA
b <- st_centroid(a, of_largest_polygon=TRUE)
barracuda156 commented 3 weeks ago
> b <- st_centroid(a, of_largest_polygon=FALSE)
Warning message:
In st_centroid.sfc(a, of_largest_polygon = FALSE) :
  st_centroid does not give correct centroids for longitude/latitude data
> st_crs(a) <- NA
> b <- st_centroid(a, of_largest_polygon=TRUE)
> c <- st_coordinates(b)
> text(c, label=nc$FIPSNO, cex=0.5)
Error in text.default(c, label = nc$FIPSNO, cex = 0.5) : 
  plot.new has not been called yet

The last command launched X window, but an empty one.

rsbivand commented 3 weeks ago

Error caused by missing plot(a) in same context.

library(sf)
packageVersion("sf")
sf_extSoftVersion()
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
st_crs(nc) <- "+proj=longlat +datum=NAD27"
sf_use_s2(FALSE)
a <- st_geometry(nc)
plot(a, axes=TRUE)
b <- st_centroid(a, of_largest_polygon=FALSE)
c <- st_coordinates(b)
text(c, label=nc$FIPSNO, cex=0.5)

may work, the culprit seems to be CPL_geodetic_area which I'm still looking for. Please also try:

library(sf)
packageVersion("sf")
sf_extSoftVersion()
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
st_crs(nc) <- "+proj=longlat +datum=NAD27"
sf_use_s2(TRUE)
a <- st_geometry(nc)
plot(a, axes=TRUE)
b <- st_centroid(a, of_largest_polygon=TRUE)
c <- st_coordinates(b)
text(c, label=nc$FIPSNO, cex=0.5)
barracuda156 commented 3 weeks ago

If I proceed after the first warning, then it the end nothing happens, no error, no output:

> st_crs(nc) <- "+proj=longlat +datum=NAD27"
> sf_use_s2(FALSE)
Spherical geometry (s2) switched off
> a <- st_geometry(nc)
> plot(a, axes=TRUE)
> b <- st_centroid(a, of_largest_polygon=FALSE)
Warning message:
In st_centroid.sfc(a, of_largest_polygon = FALSE) :
  st_centroid does not give correct centroids for longitude/latitude data
> c <- st_coordinates(b)
> text(c, label=nc$FIPSNO, cex=0.5)
rsbivand commented 3 weeks ago

OK, returning to the original script:

library(sf)
packageVersion("sf")
sf_extSoftVersion()
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
st_crs(nc) <- "+proj=longlat +datum=NAD27"
sf_use_s2(FALSE)
a <- st_geometry(nc)
plot(a, axes=TRUE)
b <- st_centroid(a, of_largest_polygon=TRUE)

run traceback() after the error. st_centroid goes to lots of other functtions to find the largest polygon - the problem may be there.

barracuda156 commented 3 weeks ago

Running this, I got plot with numbers in the end:

> library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE
> packageVersion("sf")
[1] ‘1.0.16’
> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
      "3.12.1"        "3.8.5"        "9.4.0"         "true"         "true" 
          PROJ 
       "9.4.0" 
> nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
Reading layer `sids' from data source 
  `/opt/local/Library/Frameworks/R.framework/Versions/4.4/Resources/library/spData/shapes/sids.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
CRS:           NA
> st_crs(nc) <- "+proj=longlat +datum=NAD27"
> sf_use_s2(FALSE)
Spherical geometry (s2) switched off
> a <- st_geometry(nc)
> plot(a, axes=TRUE)
> b <- st_centroid(a, of_largest_polygon=FALSE)
Warning message:
In st_centroid.sfc(a, of_largest_polygon = FALSE) :
  st_centroid does not give correct centroids for longitude/latitude data
> c <- st_coordinates(b)
> text(c, label=nc$FIPSNO, cex=0.5)

And also from the second code. Result looks the same, perhaps, without comparing numbers at least.

plot2

barracuda156 commented 3 weeks ago

Original script version with traceback:

> library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE
> packageVersion("sf")
[1] ‘1.0.16’
> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
      "3.12.1"        "3.8.5"        "9.4.0"         "true"         "true" 
          PROJ 
       "9.4.0" 
> nc <- st_read(system.file("shapes/sids.shp", package="spData")[1])
Reading layer `sids' from data source 
  `/opt/local/Library/Frameworks/R.framework/Versions/4.4/Resources/library/spData/shapes/sids.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
CRS:           NA
> st_crs(nc) <- "+proj=longlat +datum=NAD27"
> sf_use_s2(FALSE)
Spherical geometry (s2) switched off
> a <- st_geometry(nc)
> plot(a, axes=TRUE)
> b <- st_centroid(a, of_largest_polygon=TRUE)
Error in CPL_geodetic_area(st_geometry(x), p$SemiMajor, p$InvFlattening) : 
  Unknown WKB type (648)! Full WKB type number was (50331648).
> traceback()
7: CPL_geodetic_area(st_geometry(x), p$SemiMajor, p$InvFlattening)
6: lwgeom::st_geod_area(x)
5: st_area.sfc(pols)
4: st_area(pols)
3: largest_ring(x[multi])
2: st_centroid.sfc(a, of_largest_polygon = TRUE)
1: st_centroid(a, of_largest_polygon = TRUE)
rsbivand commented 3 weeks ago

This reason for placing the number at the centroid of the largest polygon is that one county is made up of several parts. Thanks for the traceback!

rsbivand commented 3 weeks ago

From https://github.com/r-spatial/lwgeom/blob/9d8078cbfcc33d2739344fa60b9188c6612fcd1d/R/geod.R#L13-L19 could you try:

library(sf)
nc = st_read(system.file("shapes/sids.shp", package="spData")[1])
library(lwgeom)
st_geod_area(nc)
barracuda156 commented 3 weeks ago

This fails:

> library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE
> nc = st_read(system.file("shapes/sids.shp", package="spData")[1])
Reading layer `sids' from data source 
  `/opt/local/Library/Frameworks/R.framework/Versions/4.4/Resources/library/spData/shapes/sids.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
CRS:           NA
> library(lwgeom)
Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.12.1, PROJ 9.4.0

Attaching package: ‘lwgeom’

The following object is masked from ‘package:sf’:

    st_perimeter

> st_geod_area(nc)
Error in st_geod_area(nc) : st_is_longlat(x) is not TRUE
rsbivand commented 3 weeks ago

Sorry, should have set the coordinate reference system to match:

library(sf)
nc = st_read(system.file("shapes/sids.shp", package="spData")[1])
st_crs(nc) <- "+proj=longlat +datum=NAD27"
library(lwgeom)
st_geod_area(nc)
barracuda156 commented 3 weeks ago

And this triggered the error we have seen initially!

> library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE
> nc = st_read(system.file("shapes/sids.shp", package="spData")[1])
Reading layer `sids' from data source 
  `/opt/local/Library/Frameworks/R.framework/Versions/4.4/Resources/library/spData/shapes/sids.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
CRS:           NA
> st_crs(nc) <- "+proj=longlat +datum=NAD27"
> library(lwgeom)
Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.12.1, PROJ 9.4.0

Attaching package: ‘lwgeom’

The following object is masked from ‘package:sf’:

    st_perimeter

> st_geod_area(nc)
Unknown WKB type (296)! Full WKB type number was (100663296).
terminate called after throwing an instance of 'Rcpp::exception'
  what():  lwgeom error
rsbivand commented 3 weeks ago

OK, thanks, I'll raise an issue with lwgeom, ping you, and see if @edzer can see anything - maybe the LWGEOM code (or where the WKB binary geometry representation is converted to what LWGEOM wants) is susceptible to big-endedness and/or 32-bit. I'll change the vignette to use s2 and let you when I have committed and pushed.

barracuda156 commented 3 weeks ago

OK, thanks, I'll raise an issue with lwgeom, ping you, and see if @edzer can see anything - maybe the LWGEOM code (or where the WKB binary geometry representation is converted to what LWGEOM wants) is susceptible to big-endedness and/or 32-bit. I'll change the vignette to use s2 and let you when I have committed and pushed.

Thank you!

P. S. Aside of bitness and endianness there is one more potential source of obscure breakages – Darwin ppc ABI uses 4-byte bools. So if some alignments or size of structure assume 1-byte bool and that matters to the code execution, that may cause trouble.

rsbivand commented 3 weeks ago

@barracuda156 Could you please check the vignette as just updated in https://github.com/r-spatial/spdep/commit/1fb494364507cc9539901271c202cad69e61f324 ? sids.zip

edzer commented 3 weeks ago

Does this persist if you set

sf_use_s2(TRUE)

?

rsbivand commented 3 weeks ago

No, that was https://github.com/r-spatial/spdep/issues/154#issuecomment-2143769544, second variant, vignette updated to that. lwgeom seems to fall over on big-endian WKB.