r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.33k stars 293 forks source link

Test error: Cannot open layer Parcely #1583

Closed jeroen closed 3 years ago

jeroen commented 3 years ago

On R-universe we have (yet another) build of GDAL. I see this check errror: https://github.com/r-universe/r-spatial/runs/1722546400?check_suite_focus=true

  > 
  > if ("GML" %in% st_drivers()$name) {
  +   gml = system.file("gml/fmi_test.gml", package = "sf")
  +   print(dim(st_read(gml, quiet = TRUE)))
  +   gml = system.file("gml/20170930_OB_530964_UKSH.xml.gz", package = "sf")
  +   print(dim(st_read(gml, layer = "Parcely", quiet = TRUE)))
  +   print(dim(st_read(gml, layer = "Parcely", int64_as_string=TRUE, quiet = TRUE)))
  + }
  [1] 22 11
  Cannot open layer Parcely
  Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  : 
    Opening layer failed.
  Calls: print -> st_read -> st_read.character -> CPL_read_ogr
  Execution halted

This could very well be my fault for how I built GDAL. Do you have any guess which driver library need to be enabled, or if some datum file is missing?

If you have a mac, you can install the binary like this to try locally

install.packages("sf", repos = "https://r-spatial.r-universe.dev")
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.2.0, PROJ 7.2.0

Thanks!

rsbivand commented 3 years ago

Try st_drivers() and see if "GML" is there. The driver is built if:

(read support needs Xerces or libexpat) https://gdal.org/drivers/vector/gml.html#vector-gml

edzer commented 3 years ago

That was also what I had in mind; I see libexpat used whenever something is XML-based (as GML is).

jeroen commented 3 years ago

Right now we build with expat but without xerxes. Looks like GML is there:

> st_drivers()
                         name                                            long_name write  copy is_raster is_vector   vsi
ESRIC                   ESRIC                                   Esri Compact Cache FALSE FALSE      TRUE      TRUE  TRUE
PCIDSK                 PCIDSK                                 PCIDSK Database File  TRUE FALSE      TRUE      TRUE  TRUE
netCDF                 netCDF                           Network Common Data Format  TRUE  TRUE      TRUE      TRUE FALSE
PDS4                     PDS4                         NASA Planetary Data System 4  TRUE  TRUE      TRUE      TRUE  TRUE
VICAR                   VICAR                                      MIPL VICAR file  TRUE  TRUE      TRUE      TRUE  TRUE
JP2OpenJPEG       JP2OpenJPEG           JPEG-2000 driver based on OpenJPEG library FALSE  TRUE      TRUE      TRUE  TRUE
PDF                       PDF                                       Geospatial PDF  TRUE  TRUE      TRUE      TRUE FALSE
MBTiles               MBTiles                                              MBTiles  TRUE  TRUE      TRUE      TRUE  TRUE
BAG                       BAG                           Bathymetry Attributed Grid  TRUE  TRUE      TRUE      TRUE  TRUE
EEDA                     EEDA                                Earth Engine Data API FALSE FALSE     FALSE      TRUE FALSE
OGCAPI                 OGCAPI                                               OGCAPI FALSE FALSE      TRUE      TRUE  TRUE
ESRI Shapefile ESRI Shapefile                                       ESRI Shapefile  TRUE FALSE     FALSE      TRUE  TRUE
MapInfo File     MapInfo File                                         MapInfo File  TRUE FALSE     FALSE      TRUE  TRUE
UK .NTF               UK .NTF                                              UK .NTF FALSE FALSE     FALSE      TRUE  TRUE
LVBAG                   LVBAG                          Kadaster LV BAG Extract 2.0 FALSE FALSE     FALSE      TRUE  TRUE
OGR_SDTS             OGR_SDTS                                                 SDTS FALSE FALSE     FALSE      TRUE  TRUE
S57                       S57                                       IHO S-57 (ENC)  TRUE FALSE     FALSE      TRUE  TRUE
DGN                       DGN                                     Microstation DGN  TRUE FALSE     FALSE      TRUE  TRUE
OGR_VRT               OGR_VRT                             VRT - Virtual Datasource FALSE FALSE     FALSE      TRUE  TRUE
REC                       REC                                        EPIInfo .REC  FALSE FALSE     FALSE      TRUE FALSE
Memory                 Memory                                               Memory  TRUE FALSE     FALSE      TRUE FALSE
BNA                       BNA                                            Atlas BNA  TRUE FALSE     FALSE      TRUE  TRUE
CSV                       CSV                         Comma Separated Value (.csv)  TRUE FALSE     FALSE      TRUE  TRUE
GML                       GML                      Geography Markup Language (GML)  TRUE FALSE     FALSE      TRUE  TRUE
GPX                       GPX                                                  GPX  TRUE FALSE     FALSE      TRUE  TRUE
KML                       KML                        Keyhole Markup Language (KML)  TRUE FALSE     FALSE      TRUE  TRUE
GeoJSON               GeoJSON                                              GeoJSON  TRUE FALSE     FALSE      TRUE  TRUE
GeoJSONSeq         GeoJSONSeq                                     GeoJSON Sequence  TRUE FALSE     FALSE      TRUE  TRUE
ESRIJSON             ESRIJSON                                             ESRIJSON FALSE FALSE     FALSE      TRUE  TRUE
TopoJSON             TopoJSON                                             TopoJSON FALSE FALSE     FALSE      TRUE  TRUE
OGR_GMT               OGR_GMT                             GMT ASCII Vectors (.gmt)  TRUE FALSE     FALSE      TRUE  TRUE
GPKG                     GPKG                                           GeoPackage  TRUE  TRUE      TRUE      TRUE  TRUE
SQLite                 SQLite                                  SQLite / Spatialite  TRUE FALSE     FALSE      TRUE  TRUE
OGR_DODS             OGR_DODS                                             OGR_DODS FALSE FALSE     FALSE      TRUE FALSE
ODBC                     ODBC                                                 ODBC  TRUE FALSE     FALSE      TRUE FALSE
WAsP                     WAsP                                     WAsP .map format  TRUE FALSE     FALSE      TRUE  TRUE
PGeo                     PGeo                            ESRI Personal GeoDatabase FALSE FALSE     FALSE      TRUE FALSE
MSSQLSpatial     MSSQLSpatial                Microsoft SQL Server Spatial Database  TRUE FALSE     FALSE      TRUE FALSE
PostgreSQL         PostgreSQL                                   PostgreSQL/PostGIS  TRUE FALSE     FALSE      TRUE FALSE
OpenFileGDB       OpenFileGDB                                         ESRI FileGDB FALSE FALSE     FALSE      TRUE  TRUE
XPlane                 XPlane                 X-Plane/Flightgear aeronautical data FALSE FALSE     FALSE      TRUE  TRUE
DXF                       DXF                                          AutoCAD DXF  TRUE FALSE     FALSE      TRUE  TRUE
CAD                       CAD                                       AutoCAD Driver FALSE FALSE      TRUE      TRUE  TRUE
FlatGeobuf         FlatGeobuf                                           FlatGeobuf  TRUE FALSE     FALSE      TRUE  TRUE
Geoconcept         Geoconcept                                           Geoconcept  TRUE FALSE     FALSE      TRUE  TRUE
GeoRSS                 GeoRSS                                               GeoRSS  TRUE FALSE     FALSE      TRUE  TRUE
GPSTrackMaker   GPSTrackMaker                                        GPSTrackMaker  TRUE FALSE     FALSE      TRUE  TRUE
VFK                       VFK                 Czech Cadastral Exchange Data Format FALSE FALSE     FALSE      TRUE FALSE
PGDUMP                 PGDUMP                                  PostgreSQL SQL dump  TRUE FALSE     FALSE      TRUE  TRUE
OSM                       OSM                            OpenStreetMap XML and PBF FALSE FALSE     FALSE      TRUE  TRUE
GPSBabel             GPSBabel                                             GPSBabel  TRUE FALSE     FALSE      TRUE FALSE
SUA                       SUA      Tim Newport-Peace's Special Use Airspace Format FALSE FALSE     FALSE      TRUE  TRUE
OpenAir               OpenAir                                              OpenAir FALSE FALSE     FALSE      TRUE  TRUE
OGR_PDS               OGR_PDS                         Planetary Data Systems TABLE FALSE FALSE     FALSE      TRUE  TRUE
WFS                       WFS                        OGC WFS (Web Feature Service) FALSE FALSE     FALSE      TRUE  TRUE
OAPIF                   OAPIF                                   OGC API - Features FALSE FALSE     FALSE      TRUE FALSE
HTF                       HTF                         Hydrographic Transfer Vector FALSE FALSE     FALSE      TRUE  TRUE
AeronavFAA         AeronavFAA                                          Aeronav FAA FALSE FALSE     FALSE      TRUE  TRUE
Geomedia             Geomedia                                        Geomedia .mdb FALSE FALSE     FALSE      TRUE FALSE
EDIGEO                 EDIGEO                        French EDIGEO exchange format FALSE FALSE     FALSE      TRUE  TRUE
SVG                       SVG                             Scalable Vector Graphics FALSE FALSE     FALSE      TRUE  TRUE
CouchDB               CouchDB                                   CouchDB / GeoCouch  TRUE FALSE     FALSE      TRUE FALSE
Cloudant             Cloudant                                   Cloudant / CouchDB  TRUE FALSE     FALSE      TRUE FALSE
Idrisi                 Idrisi                                 Idrisi Vector (.vct) FALSE FALSE     FALSE      TRUE  TRUE
ARCGEN                 ARCGEN                                    Arc/Info Generate FALSE FALSE     FALSE      TRUE  TRUE
SEGUKOOA             SEGUKOOA                                 SEG-P1 / UKOOA P1/90 FALSE FALSE     FALSE      TRUE  TRUE
SEGY                     SEGY                                                SEG-Y FALSE FALSE     FALSE      TRUE  TRUE
XLS                       XLS                                      MS Excel format FALSE FALSE     FALSE      TRUE FALSE
ODS                       ODS Open Document/ LibreOffice / OpenOffice Spreadsheet   TRUE FALSE     FALSE      TRUE  TRUE
XLSX                     XLSX                       MS Office Open XML spreadsheet  TRUE FALSE     FALSE      TRUE  TRUE
Elasticsearch   Elasticsearch                                       Elastic Search  TRUE FALSE     FALSE      TRUE FALSE
Walk                     Walk                                                 Walk FALSE FALSE     FALSE      TRUE FALSE
Carto                   Carto                                                Carto  TRUE FALSE     FALSE      TRUE FALSE
AmigoCloud         AmigoCloud                                           AmigoCloud  TRUE FALSE     FALSE      TRUE FALSE
SXF                       SXF                          Storage and eXchange Format FALSE FALSE     FALSE      TRUE  TRUE
Selafin               Selafin                                              Selafin  TRUE FALSE     FALSE      TRUE  TRUE
JML                       JML                                         OpenJUMP JML  TRUE FALSE     FALSE      TRUE  TRUE
PLSCENES             PLSCENES                               Planet Labs Scenes API FALSE FALSE      TRUE      TRUE FALSE
CSW                       CSW               OGC CSW (Catalog  Service for the Web) FALSE FALSE     FALSE      TRUE FALSE
VDV                       VDV                  VDV-451/VDV-452/INTREST Data Format  TRUE FALSE     FALSE      TRUE  TRUE
MVT                       MVT                                  Mapbox Vector Tiles  TRUE FALSE     FALSE      TRUE  TRUE
NGW                       NGW                                          NextGIS Web  TRUE  TRUE      TRUE      TRUE FALSE
MapML                   MapML                                                MapML  TRUE FALSE     FALSE      TRUE  TRUE
TIGER                   TIGER                               U.S. Census TIGER/Line  TRUE FALSE     FALSE      TRUE  TRUE
AVCBin                 AVCBin                             Arc/Info Binary Coverage FALSE FALSE     FALSE      TRUE  TRUE
AVCE00                 AVCE00                        Arc/Info E00 (ASCII) Coverage FALSE FALSE     FALSE      TRUE  TRUE
HTTP                     HTTP                                HTTP Fetching Wrapper FALSE FALSE      TRUE      TRUE FALSE
edzer commented 3 years ago

Am I right that this has been resolved? (I looked at https://github.com/r-universe/r-spatial/actions )

jeroen commented 3 years ago

I'm now seeing another error. Perhaps I am doing something wrong. Are we supposed to pass special flags when building sf against a static gdal/proj?

checking for gdal.h... yes
checking GDAL: linking with --libs only... no
checking GDAL: linking with --libs and --dep-libs... yes
checking GDAL: /usr/local/share/gdal/pcs.csv readable... no
checking GDAL: checking whether PROJ is available for linking:... yes
checking GDAL: checking whether PROJ is available fur running:... ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
ERROR 1: PROJ: proj_create: unrecognized format / unknown name
ERROR 6: Cannot find coordinate operations from `' to `'
no
configure: error: OGRCoordinateTransformation() does not return a coord.trans: PROJ not available?
ERROR: configuration failed for package ‘sf’

Right now I just set an environment variable PROJ_LIB=/usr/local/share/proj and expect that sf will query gdal-config and geos-config and pkg-config to find the required flags but maybe that doesn't work.

edzer commented 3 years ago

Another question would be (@rsbivand ) whether, in this setup, gdal was statically linked to PROJ - I guess that would be required too? The error message indicates GDAL is there, but GDAL cannot get hold of PROJ (was not linked against proj?).

edzer commented 3 years ago

Some older OSX build notes from Simon are found here: https://github.com/r-spatial/sf/issues/327

rsbivand commented 3 years ago

Not really clear why this would bite - what does it do in rgdal or terra? Are the configure scripts different?

edzer commented 3 years ago

We would find out if they were on the r-spatial github org!

jeroen commented 3 years ago

OK part of the problem was on my end, as the PROJ_LIB variable was not getting set consistently. That is now solved, but my main question remains: How to instruct sf/rgdal configure script to copy the proj/gdal datum files into the R package so that we get a redistributable binary package?

So the same as what we do for Windows here:

https://github.com/r-spatial/sf/blob/e6a6a4d3e1f1ca3c42c0b0f6feaa245fc3530bae/src/Makevars.win#L30-L31

Because from the mac binaries, it seems that proj/gdal were successfully statically linked, but the configure script did not copy over the datum files into the package.

It is unclear to me how this is supposed to happen. Or does the CRAN builder have some manual hack script to insert those things after the package has been built?

Check logs:

rsbivand commented 3 years ago

No, no CRAN hack. In rgdal towards the end of configure.ac and controlled by a configure argument --with-data-copy typically used with MacOS - in addition to src/Makevars.win target winlibs:

# Optional local copy of GDAL datadir and PROJ_LIB

data_copy=no
AC_ARG_WITH([data-copy],
    AC_HELP_STRING([--with-data-copy=yes/no],
               [local copy of data directories in package, default no]),
               [data_copy=$withval])
if test "${data_copy}" = "yes" ; then
AC_MSG_NOTICE([Copy data for:])
  proj_lib0="${PROJ_LIB}"
  AC_ARG_WITH([proj-data],
    AC_HELP_STRING([--with-proj-data=DIR],
                   [location of PROJ.4 data directory]),
    [proj_lib1=$withval])
  if test -n "${proj_lib0}" ; then
    proj_lib="${proj_lib0}"
  else
    proj_lib="${proj_lib1}"
  fi
  if test -n "${proj_lib}" ; then
    if test -d "${proj_lib}" ; then
      cp -r "${proj_lib}" "${R_PACKAGE_DIR}"
      AC_MSG_NOTICE([  PROJ.4: ${proj_lib}])
    else
      AC_MSG_ERROR([PROJ.4 data files not found; set environment variable PROJ_LIB=DIR or --with-proj-data=DIR.])
    fi
  else
      AC_MSG_ERROR([PROJ.4 data files not found; set environment variable PROJ_LIB=DIR or --with-proj-data=DIR.])
  fi

  if test -d "${GDAL_DATADIR}" ; then
    cp -r "${GDAL_DATADIR}" "${R_PACKAGE_DIR}"
    AC_MSG_NOTICE([  GDAL: ${GDAL_DATADIR}])
  else
    AC_MSG_ERROR([GDAL data files not found.])
  fi
fi
rsbivand commented 3 years ago

For systems with installed PROJ/GDAL, --with-data-copy should rather not be used; this also affects OSGeo4W systems. Local copies in R packages also get duplicated - CRAN have suggested an optional stand-alone data package with just inst/proj/ and inst/gdal/ to avoid multiple CRAN binaries shipping copies of the same files.

jeroen commented 3 years ago

Right, but on r-universe we want to build redistributable binary R packages identical to those from CRAN. So we need to copy the data files.

rsbivand commented 3 years ago

Copying the data files would be needed on systems other than Windows and MacOS only if you build static there too. Lots of other OSGeo software uses PROJ/GDAL/GEOS, and they expect the metadata to be systemwide on Linux, etc. Is the argument only redistributability eg. for cloud settings? This may rather point us urgently to making the data files a separate package, right?

edzer commented 3 years ago

A data package will have the challenge that you need to verify it's correctness (does the version match that of the software?); also, for GDAL the gain would be rather modest in terms of package size reduction; PROJ is moving away from packaged data distributions (access grids on CDN).

The goal of a static build may be independence of everything. Do you know how do conda & Python virtual env's do this?

rsbivand commented 3 years ago

Conda etc. close to random to judge from their total absence from discussions on the PROJ/GDAL/GEOS lists. For PROJ, EPSG versions matter and are not really in step with PROJ releases. I think that the only tripup recently was that proj.db table and field names changed from versions 9 to 10, so keeping the external binaries and their data synchronised did matter.

edzer commented 3 years ago

So having a data package on CRAN with proj.db would require OSX and windows builds to use identical PROJ versions. That kicks out PROJ.

rsbivand commented 3 years ago

Well, so far we haven't updated static PROJ so often. But yes, keeping the data and the package versions matched would need checking.

jeroen commented 3 years ago

I think it would be nearly impossible to orchestrate updates to gdal and proj if all we have 10 or 20 R packages that may be linked to various versions of gdal/proj, but all have to use the same shared datum files.

I think it makes sense to ship compatible datum files along with each R binary package that has been statically linked to a given gdal/proj version, as we do currently.

Conda is different because they also provide conda packages with dynamic libraries for gdal and proj, and the R packages are built to dynamically link to that. So it is more similar to linux, but effectively conda replaces both the CRAN binaries and the apt/yum binaries. But in practice, this creates as many new problems as it solves.

jeroen commented 3 years ago

This problem has disappeared now that we are properly including the datum files with the package 🎉