r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
553 stars 94 forks source link

pkg stars breaks on CRAN / UCRT #453

Closed edzer closed 2 years ago

edzer commented 3 years ago

Package stars now breaks on CRAN/UCRT, quite consistently, on

    > r = read_stars(system.file("nc/reduced.nc", package = "stars"))
    trying to read file: C:\msys64\home\tomas\ucrt3\svn\ucrt3\r_packages\pkgcheck\CRAN\stars\stars.Rcheck\stars\nc\reduced.nc
    Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
     file not found
    Calls: read_stars ... is_big -> read_stars -> <Anonymous> -> CPL_read_gdal

The call to CPL_read_gdal() takes place in package sf, which calls GDAL, which (in this case) in turns does a call to the NetCDF driver which calls the netcdf library. I had hoped that this would have gone away by @jeroen 's https://github.com/r-spatial/sf/pull/1734 which meant linking sf to UCRT compiled libraries; since that PR, sf passes all it's checks on UCRT (but those do not include reading netcdf files).

@kalibera do you have any idea what I can do to make it pass this check? And how I could do experiements (like: is there a CRAN-like win-builder for UCRT at the horizon)?

kalibera commented 3 years ago

I confirm the file (reduced.nc) exists and is being read (that I see on process monitor). It would be good to find out why GDALOpenEx is unhappy about the file - I will have a closer look later.

If you wanted to try yourself non-interactively on an external service, you can try with github actions (see https://github.com/kalibera/ucrt3) and please report any issues with that. I belio

If you wanted to try yourself interactively, you can install the toolchain and the experimental build manually (https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html) if you have access to Windows. Or, you can install into a free trial VM if you don't or prefer to do it on an isolated system (https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/vm.html - the VM installation is automated and the documentation/code has been tested by at least two external people).

jeroen commented 3 years ago

The problem in stars had disappeared after the release of sf 1.0-2. However Tomas later started to apply a patch on his side to revert https://github.com/r-spatial/sf/pull/1734 and instead force the package to build against Tomas' local version of GDAL at which point the problem reappeared. I think his version of GDAL is either too old or missing some drivers, but it is hard to say.

I double checked that everything works 100% with our thoroughly tested version of GDAL from rwinlib, but Tomas insists we do things his way now, so sadly there isn't much I can do anymore to help out.

edzer commented 3 years ago

Thanks, both, for the help and suggestions. @kalibera has there been a public announcement about CRAN UCRT builds no longer using rwinlib that I missed? I haven't tried yet but guess that (re)installing netcdf will not do the job in this case, what is needed is a GDAL that has been built against netcdf: GDAL is a library of libraries, the way it was built determines which drivers it provides. @jeroen 's rwinlib build for GDAL is documented here: https://github.com/rwinlib/gdal3 and reflects which set of drivers current windows R users expect to be present. Reducing this set would break a lot of existing workflows.

It would be good if it were documented how GDAL/GEOS/PROJ were built on UCRT / MXE, which drivers are linked, and which versions of GDAL/GEOS/PROJ were used, without having to dig through the copy of the MXE build VM or having to print this in GitHub actions. There is a long and growing list of packages that now use these system requirements directly; an incomplete list is found here.

kalibera commented 3 years ago

The way gdal/geos/proj are compiled and their versions can be found here, the build is automated.

https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/gdal.mk https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/geos.mk https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/proj.mk

I've upgraded the netcdf library in the MXE toolchain, which meant gdal was rebuilt as well. I will try adding more drivers. Clearly it would be easier if the error message could be a bit more informative (which I don't know if it could) , e.g. "driver xxx missing" if that is the case, rather than "file not found" (when the file is present).

The configure output of gdal is as follows.

  C++14 support:             no

  LIBTOOL support:           yes

  LIBZ support:              external
  LIBLZMA support:           no
  ZSTD support:              yes
  cryptopp support:          no
  crypto/openssl support:    yes
  GRASS support:             no
  CFITSIO support:           no
  PCRaster support:          no
  LIBPNG support:            external
  DDS support:               no
  GTA support:               yes
  LIBTIFF support:           external (BigTIFF=yes)
  LIBGEOTIFF support:        external
  LIBJPEG support:           external
  12 bit JPEG:               no
  12 bit JPEG-in-TIFF:       no
  LIBGIF support:            external
  JPEG-Lossless/CharLS:      no
  OGDI support:              no
  HDF4 support:              yes
  HDF5 support:              yes
  Kea support:               no
  NetCDF support:            no
  Kakadu support:            no
  JasPer support:            no
  OpenJPEG support:          yes
  ECW support:               no
  MrSID support:             no
  MrSID/MG4 Lidar support:   no
  JP2Lura support:           no
  MSG support:               no
  EPSILON support:           no
  WebP support:              yes
  cURL support (wms/wcs/...):yes
  PostgreSQL support:        no
  LERC support:              yes
  MySQL support:             yes
  Ingres support:            no
  Xerces-C support:          no
  Expat support:             yes
  libxml2 support:           yes
 Google libkml support:     no
  ODBC support:              no
  FGDB support:              no
  MDB support:               no
  PCIDSK support:            no
  OCI support:               no
  GEORASTER support:         no
  SDE support:               no
  Rasdaman support:          no
  DODS support:              no
  SQLite support:            yes
  PCRE support:              yes
  SpatiaLite support:        no
  RasterLite2 support:       no
  Teigha (DWG and DGNv8):    no
  INFORMIX DataBlade support:no
  GEOS support:              no
  SFCGAL support:            no
  QHull support:             internal
  Poppler support:           no
  Podofo support:            no
  PDFium support:            no
  OpenCL support:            no
  Armadillo support:         yes
  FreeXL support:            yes
  SOSI support:              no
  MongoDB support:           no
  MongoCXX v3 support:       no
  HDFS support:              no
  TileDB support:            no
  userfaultfd support:       no
  misc. gdal formats:        aaigrid adrg aigrid airsar arg blx bmp bsb cals ceos ceos2 coasp cosar ctg dimap dted e00grid elas envisat ers fit gff gsg gxf hf2 idrisi ignfheightasciigrid ilwis ingr iris iso8211 jaxapalsar jdem kmlsuperoverlay l1b leveller map mrf msgn ngsgeoid nitf northwood pds prf r raw rmf rs2 safe saga sdts sentinel2 sgi sigdem srtmhgt terragen til tsx usgsdem xpm xyz zmap rik ozi grib eeda plmosaic rda wcs wms wmts daas rasterlite mbtiles pdf
  disabled gdal formats:    
  misc. ogr formats:         aeronavfaa arcgen avc bna cad csv dgn dxf edigeo geoconcept georss gml gmt gpsbabel gpx gtm htf jml mvt ntf openair openfilegdb pgdump rec s57 segukooa segy selafin shape sua svg sxf tiger vdv wasp xplane idrisi pds sdts ods xlsx amigocloud carto cloudant couchdb csw elastic gft ngw plscenes wfs gpkg vfk osm
  disabled ogr formats:     

  SWIG Bindings:             no

  PROJ >= 6:                 yes
  enable GNM building:       yes
  enable pthread support:    no
  enable POSIX iconv support:yes
  hide internal symbols:     no
edzer commented 3 years ago

Thanks; that indicates that MXE GDAL is in any case missing netcdf and PostgreSQL support, as well as misc. ogr formats

> setdiff(winlib_ogr, mxe_ogr)
[1] "odbc"         "pgeo"         "mssqlspatial" "geomedia"     "walk"  

I'm trying to digest & compare the rest of the lists

jeroen commented 3 years ago

@kalibera Are there any technical reasons why you are patching sf and all other gdal packages to disable rwinlib GDAL binaries, as the package authors had intended? The rwinlib GDAL has a configuration that we have tested and tweaked over the years together with the users and package authors. It will work with any msvcrt or ucrt compiler, including your own gcc10_ucrt3_base, which would make the transition to ucrt much easier.

Even if you manage to fix up your current mxe version of GDAL, the spatial package authors regularly would like to update GDAL to a new version or enable extra drivers. This would become impossible if you insist that all spatial packages hardcode linker flags for your current mxe GDAL version.

kalibera commented 3 years ago

Gdal in r-winlib is now incompatible with the MXE toolchain. The current incompatibility is due to "#define _GLIBCXX_HAVE_AT_QUICK_EXIT 1" in the MXE toolchain but not present in Rtools4 (when that gdal was built). Just yesterday's release of rgdal which started linking against the downloaded binaries from r-winlib stopped working (cannot be even built).

Static libraries need to be built by the same toolchain, always, when they are to be linked together on Windows. Moving from MSVCRT to UCRT is an extreme case when everything becomes incompatible, but smaller updates can happen any time which only make something incompatible.

Things should be made to work with libraries available with the toolchain, but for that it is really needed that those libraries are used by the packages. If someone needs more frequent updates that is feasible this way, then like on other platforms, there is still the option of building from source while building the package. This pattern is was common particularly in Bioconductor packages and made the switch to UCRT much easier (for the packages needed by CRAN, that is those I am building).

mmaechler commented 3 years ago

... but Tomas insists we do things his way now, so sadly there isn't much I can do anymore to help out.

It's not just his way. We've never been happy with the concept of r-winlib providing binaries that are downloaded at install time of a source package installation on Windows. I'm pretty sure you've heard about this unhappyness more than once and a relatively long time ago, i.e., before even the update from Rtools(3) to Rtools4. Many, me included, are very happy for all the service you've provided to package maintainers, notably on Windows, and to all R for Windows users and you deserve to be applauded for this and other such services to the R community.

However, that part (downloading binaries from R-winlib) has always been a dark spot .. that it seems you've never been interested fixing. AFAIK Tomas @kalibera has explained to you that it's not true that such a scheme will always work, rather to the contrary, that you cannot know exactly what DLLs are present on the useR's Windows and that subtle mismatch of libraries or other compiled code on Windows can occur and by Murphy's law will happen albeit relatively rarely. Consequently the "download DLLs from r-winlib at source install/build time"-approach is not safe.... indeed, as Tomas just wrote (1h ago, above), it can even disable building/updating packages.

jeroen commented 3 years ago

However, that part (downloading binaries from R-winlib) has always been a dark spot .. that it seems you've never been interested fixing.

The entire goal of rtools4 being based on msys2 is that we can use the standard mingw-w64 package manager that everyone else also uses, to build, distribute, and install system libraries, exactly the same way as on Linux. I have dedicated entire talks about this, including at DSC2019 but R-core and CRAN never cared to try to use this to install system libraries. So as a fallback, packages had to download a copy of these libraries manually. The lack of interest to improve this has really not been from my side.

mmaechler commented 3 years ago

@jeroen: Ok, I apologize for stating too strongly. You do mention that the pre-4 Rtools do this, and it is not good, https://jeroen.github.io/dsc2019/#17 , and that Rtools 4.0 would do this better using pacman. That was fine to read (and now re-read), but the result has not been that in Rtools4, R packages would call pacman locally when being installed and linked against the libraries that would then be present (and consistent in some sense) on their machine, but rather continue downloading DLLs from R-winlib, no? But yes, I was wrong to say you were never interested in fixing that. I'm sorry for wrongly chosen words.

jeroen commented 3 years ago

@kalibera the compile error actually is not related to the toolchains but rather a bug in recent versions of GDAL. I had reported it to them a while back, but failed to find a proper fix: https://github.com/OSGeo/gdal/pull/3720

Somehow the problem got triggered by your latest mxe build. The only reason you are not seeing this bug yourself is because your mxe version of GDAL is quite old (gdal 3.0.4 from January 2020) which did not have this bug.

I have fixed to the gdal3 rwinlib bundle, so all GDAL packages on CRAN can now be installed on any ucrt compiler without the need for patching. Of course that does not solve our differences in opinion but at least it gives users a way forward to get a properly working version of sf, rgdal, etc on ucrt, and hopefully help with the ucrt transition.

edzer commented 3 years ago

With an updated toolchain @kalibera got NetCDF reading working, but with the warning GDAL Message 1: Recode from UTF-8 to CP_ACP failed with the error: "Invalid argument". This might be related to the NetCDF GDAL driver converting file names to CP_ACP before opening, a non-merged PR that might fix this is https://github.com/OSGeo/gdal/pull/1568

yutannihilation commented 3 years ago

@jeroen Sorry to ask an off-topic question, but I'm really curious about the answer to this.

R packages would call pacman locally when being installed and linked against the libraries that would then be present (and consistent in some sense) on their machine, but rather continue downloading DLLs from R-winlib, no?

I thought we had a bright future with pacman when I saw the slides of your talk at DSC2019. Even if the MXE toolchain is chosen by default (I'm not sure if it's a good idea), I wonder a package can override it with pacman's one. Is that the CRAN policy that prohibits us to write in "anywhere else on the file system apart from the R session’s temporary directory"? Are there any sharable details?