paleolimbot / ggspatial

Enhancing spatial visualization in ggplot2
https://paleolimbot.github.io/ggspatial
368 stars 34 forks source link

Possible issue with **sf** main branch head and GDAL 3.5.0RC1 #100

Closed rsbivand closed 2 years ago

rsbivand commented 2 years ago

Running revdep checks on GDAL 3.5.0RC1 (locally installed cmake build), I see:

> library(ggplot2)
> load_longlake_data()
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
Calls: load_longlake_data ... z_range.MtrxSetSetSet -> zb_wrap -> stopifnot -> CPL_get_z_range
Execution halted
* checking for unstated dependencies in 'tests' ... OK
* checking tests ... ERROR
  Running ‘testthat.R’
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
    3.     ├─sf::st_read(...)
    4.     └─sf:::st_read.character(...)
    5.       └─sf:::process_cpl_read_ogr(...)
    6.         └─sf::st_sfc(geom[[i]], crs = attr(geom[[i]], "crs"))
    7.           └─sf:::compute_z_range(lst)
    8.             ├─sf:::zb_wrap(z_range.MtrxSetSetSet(obj))
    9.             │ └─base::stopifnot(is.numeric(zb) && length(zb) == 2)
   10.             └─sf:::z_range.MtrxSetSetSet(obj)
   11.               ├─sf:::zb_wrap(CPL_get_z_range(obj, 3))
   12.               │ └─base::stopifnot(is.numeric(zb) && length(zb) == 2)
   13.               └─sf:::CPL_get_z_range(obj, 3)

  [ FAIL 3 | WARN 0 | SKIP 21 | PASS 84 ]
  Error: Test failures
  Execution halted
* checking PDF version of manual ... OK
* DONE
Status: 2 ERRORs, 1 NOTE

The same error is present with released sf 1.0-7 and GDAL 3.5.0RC1. It is not present with GDAL 3.4.3 and sf main branch HEAD. sf passes its own tests on GDAL 3.5.0RC1, but whether the problem is in the use of sf code in ggspatial triggering something changed in GDAL is unknown.

One change: GDAL 3.4.3 is built and installed using deprecated autotools, 3.5.0 supports both cmake and autotools and was installed using cmake to access the nacent feaher/arrow vector driver, 3.6 will only support cmake.

We'd like to move to 3.5 on Windows (maybe macOS) because of updates to the RRASTER GDAL driver to support WKT2 in addition to proj4 CRS representation.

paleolimbot commented 2 years ago

Thank you for this!

I've noticed the same error message before but thought it was a bug in my own code...wk::sfc_writer() does its own sfc generation, but I don't use it anywhere by default because it's still experimental). Upon some investigation, though, I think it's a bug in sf (and wk does the "right thing" by filling empties with the proper dimension):

sf::st_as_sfc(c("LINESTRING Z (1 2 3, 4 5 6)", "LINESTRING EMPTY"))
#> Error in CPL_get_z_range(obj, 1): z error - expecting three columns;
sf::st_as_sfc(c("LINESTRING Z (1 2 3, 4 5 6)", "LINESTRING Z EMPTY"))
#> Geometry set for 2 features  (with 1 geometry empty)
#> Geometry type: LINESTRING
#> Dimension:     XYZ
#> Bounding box:  xmin: 1 ymin: 2 xmax: 4 ymax: 5
#> z_range:       zmin: 3 zmax: 6
#> CRS:           NA
#> LINESTRING Z (1 2 3, 4 5 6)
#> LINESTRING Z EMPTY
wk::wk_handle(wk::wkt(c("LINESTRING Z (1 2 3, 4 5 6)", "LINESTRING EMPTY")), wk::sfc_writer())
#> Geometry set for 2 features  (with 1 geometry empty)
#> Geometry type: LINESTRING
#> Dimension:     XY, XYZ
#> Bounding box:  xmin: 1 ymin: 2 xmax: 4 ymax: 5
#> z_range:       zmin: 3 zmax: 6
#> CRS:           NA
#> LINESTRING Z (1 2 3, 4 5 6)
#> LINESTRING EMPTY

Created on 2022-05-08 by the reprex package (v2.0.1)

rsbivand commented 2 years ago

Could you take this further perhaps as an sf issue?

paleolimbot commented 2 years ago

Totally! I'll open an issue there.

rsbivand commented 2 years ago

And please report back here!

paleolimbot commented 2 years ago

I can't seem to replicate the read error...could I trouble you to run:

longlake_folder <- system.file("longlake", package = "ggspatial")
vector_layers <- c(
  "LongLakeDepthSurvey.shp",
  "LongLakeMarshWaterPoly.shp",
  "LongLakeMarshRoads.shp",
  "LongLakeMarshWetlands.shp",
  "LongLakeMarshStreams.shp",
  "LongLakeMarshBuildings.shp"
)

for (f in vector_layers) {
  sf::read_sf(file.path(longlake_folder, f))
}

...and tell me which file fails? (Everything except the first are shapefiles with a Z component)

rsbivand commented 2 years ago
> for (f in vector_layers) {
+ cat(f, "\n")
+ sf::read_sf(file.path(longlake_folder, f))
+ }
LongLakeDepthSurvey.shp 
LongLakeMarshWaterPoly.shp 
LongLakeMarshRoads.shp 
LongLakeMarshWetlands.shp 
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
rsbivand commented 2 years ago

Long form:

> for (f in vector_layers) {
+ cat(f, "\n")
+ print(sf::read_sf(file.path(longlake_folder, f)))
+ }
LongLakeDepthSurvey.shp 
Simple feature collection with 64 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 409967.1 ymin: 5083354 xmax: 411658.7 ymax: 5084777
Projected CRS: UTM_Zone_20_Northern_Hemisphere
# A tibble: 64 × 7
   WAYPOINT_I   LAT   LON DEPTH NOTES          DEPTH_M           geometry
        <dbl> <dbl> <dbl> <dbl> <chr>            <dbl>        <POINT [m]>
 1          2  45.9 -64.1   2.5 mouth of inlet     0.8 (411658.7 5084501)
 2          3  45.9 -64.1   3.1 NA                 0.9 (411630.3 5084560)
 3          5  45.9 -64.1   2.5 NA                 0.8 (411553.4 5084601)
 4          6  45.9 -64.1   2.5 NA                 0.8 (411476.4 5084600)
 5          8  45.9 -64.1   4.5 NA                 1.4 (411466.8 5084488)
 6         10  45.9 -64.1   2   NA                 0.6 (411466.4 5084410)
 7         12  45.9 -64.1   4.7 NA                 1.4 (411379.1 5084490)
 8         16  45.9 -64.1   2.5 NA                 0.8 (411321.2 5084721)
 9         17  45.9 -64.1   4.7 NA                 1.4 (411292.9 5084670)
10         19  45.9 -64.1   5   NA                 1.5 (411290.8 5084593)
# … with 54 more rows
LongLakeMarshWaterPoly.shp 
Simple feature collection with 3 features and 23 fields
Geometry type: POLYGON
Dimension:     XYZ
Bounding box:  xmin: 409949.5 ymin: 5083316 xmax: 412606.5 ymax: 5087084
z_range:       zmin: 7.2 zmax: 9.4
Projected CRS: NAD83 / UTM zone 20N
# A tibble: 3 × 24
  OBJECTID_1 OBJECTID FEAT_CODE HID   PROVKEY ZVALUE STARTDATE  ENDDATE   
       <dbl>    <int> <chr>     <chr> <chr>    <dbl> <date>     <date>    
1      12186    13716 WALK40    NA    NA           0 2009-10-09 NA        
2          0        0 NA        NA    NA           0 1899-12-30 1899-12-30
3      12173    13703 WALK40    NA    NA           0 2009-10-09 NA        
# … with 16 more variables: PRODUCT <chr>, SCALE <chr>, COLLECTOR <chr>,
#   CAPTURE <chr>, PRODYEAR <chr>, PRODMONTH <chr>, X_Y_ACC <chr>, Z_ACC <chr>,
#   MINZ <dbl>, MAXZ <dbl>, POLY_CLASS <int>, SHAPE_LENG <dbl>,
#   SHAPE_LE_1 <dbl>, label <chr>, area <dbl>, geometry <POLYGON [m]>
LongLakeMarshRoads.shp 
Simple feature collection with 46 features and 24 fields
Geometry type: LINESTRING
Dimension:     XYZ
Bounding box:  xmin: 407350.9 ymin: 5081030 xmax: 413277.4 ymax: 5086736
z_range:       zmin: 0 zmax: 19.8
Projected CRS: NAD83 / UTM zone 20N
# A tibble: 46 × 25
   OBJECTID ROADSEGID FEAT_CODE       IDS NID     SEGID STREET MUN_ID TRAFFICDIR
      <dbl>     <dbl> <chr>         <int> <chr>   <dbl> <chr>   <dbl>      <dbl>
 1   239565    810693 RRRDTK50   -2110602 -           0 Track       0          1
 2   239636    810698 RRRDTK50   -2110607 -           0 Track       0          1
 3   239499    810683 RRRDTK50   -2110592 -           0 Track       0          1
 4   193089    762437 RRBRTK50   -2062346 -           0 Track       0          1
 5   193088    762436 RRRDTK50   -2062345 -           0 Track       0          1
 6   267249    840844 RRRDDR50   -2140753 -           0 Drive…      0          1
 7   239501    810685 RRRDTK50   -2110594 -           0 Track       0          1
 8   239496    810680 RRRDTK50   -2110589 -           0 Track       0          1
 9   194817    762408 RRRDTK50   -2062317 -           0 Track       0          1
10    16137    895987 RRRDRADWZ2   166388 1d2d03…     0 NA          0          1
# … with 36 more rows, and 16 more variables: DATE_ACT <dbl>, DATE_REV <dbl>,
#   STRUCTID <chr>, PRODUCT <chr>, SCALE <chr>, COLLECTOR <chr>, CAPTURE <chr>,
#   X_Y_ACC <chr>, Z_ACC <chr>, PRODYEAR <chr>, PRODMONTH <chr>,
#   ROADCLASS <chr>, RTE_NO <dbl>, ANUM <chr>, OWNER <chr>,
#   geometry <LINESTRING [m]>
LongLakeMarshWetlands.shp 
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;

so file 4.

> packageVersion("sf")
[1] '1.0.7'
> sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
      "3.10.2"        "3.5.0"        "9.0.0"         "true"         "true" 
          PROJ 
       "9.0.0" 
paleolimbot commented 2 years ago

I posted the issue in sf but it's a bit hard for me to debug why it's occuring with dev GDAL. I think that st_sfc() is getting passed something from CPL_read_ogr() that has both XY and XYZ dimensions for LongLakeMarshWetlands.shp (debugonce(sf::st_sfc) might help?)

rsbivand commented 2 years ago

Also with released GDAL 3.4.3, and 3.5.0RC1 is forthcoming; I've been tracking 3.5.0 HEAD for about 6 months, as shifting PROJ, GEOS and GDAL to cmake in Rtools42 and macOS-recipes is the next biggish road bump. Also as I mentioned above, I updated the GDAL RRASTER driver to use WKT2 in addition to the proj4 representation, so moving to 3.5.* is encouraged once released.

> library(sf)
> debug("st_read")
> st_read(file.path(longlake_folder, vector_layers[4]))
Browse[3]> table(unlist(lapply(x$geometry, function(y) sapply(y[[1]], ncol))))

 2  3 
 1 88 
Browse[3]> str(x$geometry[[9]])
List of 2
 $ :List of 1
  ..$ : num [1:170, 1:2] 412317 412319 412320 412326 412331 ...
 $ :List of 1
  ..$ : num [1:13, 1:2] 411626 411636 411641 411642 411645 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
Browse[3]> debug(process_cpl_read_ogr)
Browse[4]> 
debug: x[[nm[i]]] = st_sfc(geom[[i]], crs = attr(geom[[i]], "crs"))
Browse[4]> 
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;

Only LongLakeMarshWetlands.shp appears to trigger the problem.

rsbivand commented 2 years ago
> library(sf)
Linking to GEOS 3.10.2, GDAL 3.5.0, PROJ 9.0.0; sf_use_s2() is TRUE
> x <- st_read("LongLakeMarshWetlands.shp")
Reading layer `LongLakeMarshWetlands' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/LongLakeMarshWetlands.shp' 
  using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.shp", options=c("-of", "ESRI Shapefile"))
> x <- st_read("llmw.shp")
Reading layer `llmw' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.gpkg", options=c("-of", "GPKG"))
Warning message:
In CPL_gdalvectortranslate(source, destination, options, oo, doo,  :
  GDAL Message 1: A geometry of type MULTIPOLYGON is inserted into layer LongLakeMarshWetlands of geometry type POLYGON, which is not normally allowed by the GeoPackage specification, but the driver will however do it. To create a conformant GeoPackage, if using ogr2ogr, the -nlt option can be used to override the layer geometry type. This warning will no longer be emitted for this combination of layer and feature geometry type.
> x <- st_read("llmw.gpkg")
Reading layer `LongLakeMarshWetlands' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.gpkg' using driver `GPKG'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.gpkg", options=c("-of", "GPKG", "-nlt", "MULTIPOLYGON"))
> x <- st_read("llmw.gpkg")
Reading layer `LongLakeMarshWetlands' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.gpkg' using driver `GPKG'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.gpkg", options=c("-of", "GPKG", "-nlt", "MULTIPOLYGONZ"))
> x <- st_read("llmw.gpkg")
Reading layer `LongLakeMarshWetlands' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.gpkg' using driver `GPKG'
Simple feature collection with 18 features and 21 fields
Geometry type: MULTIPOLYGON
Dimension:     XYZ
Bounding box:  xmin: 406919 ymin: 5081208 xmax: 414747.9 ymax: 5089171
z_range:       zmin: 4.5 zmax: 18.4
Projected CRS: NAD83 / UTM zone 20N
> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.shp", options=c("-of", "ESRI Shapefile", "-nlt", "MULTIPOLYGONZ"))
> x <- st_read("llmw.shp")
Reading layer `llmw' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;
> gdal_utils('vectortranslate', source="llmw.gpkg", destination="llmw.shp", options=c("-of", "ESRI Shapefile"))
> x <- st_read("llmw.shp")
Reading layer `llmw' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;

So the ESRI Shapefile driver appears not to accept mixed with/without Z any longer. I can't see from GDAL NEWS when that might have happened. Note that GPKG succeeds:

> gdal_utils('vectortranslate', source="LongLakeMarshWetlands.shp", destination="llmw.gpkg", options=c("-of", "GPKG", "-nlt", "MULTIPOLYGONZ"))
> x <- st_read("llmw.gpkg")
Reading layer `LongLakeMarshWetlands' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.gpkg' using driver `GPKG'
Simple feature collection with 18 features and 21 fields
Geometry type: MULTIPOLYGON
Dimension:     XYZ
Bounding box:  xmin: 406919 ymin: 5081208 xmax: 414747.9 ymax: 5089171
z_range:       zmin: 4.5 zmax: 18.4
Projected CRS: NAD83 / UTM zone 20N
> table(sapply(x$geom, function(y) class(y)[1]))

XYZ 
 18 

Note also when trying to write:

> st_write(x, "llmw.shp")
Writing layer `llmw' to data source `llmw.shp' using driver `ESRI Shapefile'
Creating or updating layer llmw failed.
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  : 
  Write error.
In addition: Warning message:
In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
  GDAL Error 6: Geometry type of `3D Multi Polygon' not supported in shapefiles.  Type can be overridden with a layer creation option of SHPT=POINT/ARC/POLYGON/MULTIPOINT/POINTZ/ARCZ/POLYGONZ/MULTIPOINTZ/MULTIPATCH.

Convert to GPKG? Looks as if ESRI Shapefile is EOL:

 xx <- st_cast(x, "POLYGON")
Warning message:
In st_cast.sf(x, "POLYGON") :
  repeating attributes for all sub-geometries for which they may not be constant
> st_write(xx, "llmw.shp", layer_options=c("SHPT=POLYGONZ"))
Writing layer `llmw' to data source `llmw.shp' using driver `ESRI Shapefile'
options:        SHPT=POLYGONZ 
Writing 19 features with 21 fields and geometry type 3D Polygon.
> st_read("llmw.shp")
Reading layer `llmw' from data source 
  `/home/rsb/topics/packages/github-rsb/ll_odd/llmw.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 3) : z error - expecting three columns;