r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

Side effects upon loading rgdal/sf affecting GDAL utils functionality on Windows system with GDAL3 #31

Open lbusett opened 4 years ago

lbusett commented 4 years ago

Hi all,

sorry for the long message, but the problem is a bit complicated to explain.

I noticed this problem because at the moment the “spatial stack” on Windows is in a sorry state, where rgdal/sp/sf binaries are linked to GDAL2/PROJ4 while the OSGEO installer/updater (IMO the “best option” to install spatial libraries on Windows) already provides GDAL3/PROJ6 on new installs, or performs updates on old installs.

This creates problems if a R user/package wants to use both rgdal/sf functionality AND gdal binary utilities (e.g., gdalwarp, ogr2ogr) called either “directly” with system2, or through the gdalUtils package, due to side-effects happening when loading/importing rgdal/sf.

In particular, there appears to be problems related with rgdal and sf modifying the PROJ_LIB environment variable on load, happening here and here.

I’ll try to clarify with an example:

On a recently updated windows machine with GDAL3/PROJ6 and all R packages up to date, in order for GDAL projection stuff to work the PROJ_LIB environment variable needs to be set to the “share/proj” subfolder of the OSgeo install folder.

If this is the case, on a “fresh” R session I can use gdal binaries from R with no problems:

# What is the "PROJ_LIB" variable?
# At the beginning, on a fresh 'R' session, it is the "correct" one for GDAL3/PROJ6
Sys.getenv("PROJ_LIB")
#> [1] "C:\\OSGeo4W64\\share\\proj"

# Get info on a file

in_file <- system.file("vectors", "cities.shp", package = "rgdal")
info <- system2("C:/OSGeo4W64/bin/gdalsrsinfo.exe",
                args = c(in_file, "-o proj4"),
                stderr = TRUE, stdout = TRUE)
info
#> [1] ""                                    "+proj=longlat +datum=WGS84 +no_defs"
#> [3] ""

# Perform a reprojection using 'ogr2ogr'

outfile <- tempfile(fileext = ".gpkg")
reproj  <- system2("C:/OSGeo4W64/bin/ogr2ogr.exe",
                   args = c(outfile, in_file, "-t_srs EPSG:3035"),
                   stderr = TRUE, stdout = TRUE)
reproj
#> character(0)

sf::st_read(outfile)
#> Reading layer `cities' from data source `C:\Users\LB\AppData\Local\Temp\RtmpCWStMT\file149448537612.gpkg' using driver `GPKG'
#> Simple feature collection with 606 features and 4 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -5118229 ymin: -5398972 xmax: 16642620 ymax: 13173490
#> epsg (SRID):    3035
#> proj4string:    +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs

However, as soon as rgdal or sf are loaded we start having problems , because the environment variable is changed so to point to the rgdal/proj subfolder of the gdal (or sf) package installation folder:

library(rgdal)
#> Loading required package: sp
#> rgdal: version: 1.4-8, (SVN revision 845)
#>  Geospatial Data Abstraction Library extensions to R successfully loaded
#>  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
#>  Path to GDAL shared files: C:/Users/LB/Documents/R/win-library/3.6/rgdal/gdal
#>  GDAL binary built with GEOS: TRUE 
#>  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
#>  Path to PROJ.4 shared files: C:/Users/LB/Documents/R/win-library/3.6/rgdal/proj
#>  Linking to sp version: 1.3-2

Sys.getenv("PROJ_LIB")
#> [1] "C:/Users/LB/Documents/R/win-library/3.6/rgdal/proj"

, leading the system calls to unexpectedly error because now PROJ_LIB points to the “proj folder” provided by rgdal, which is currently still PROJ4, while the binaries need to use that related to PROJ6.

info <- system2("C:/OSGeo4W64/bin/gdalsrsinfo.exe",
                args = c(in_file, "-o proj4"),
                stderr = TRUE, stdout = TRUE)
info
#> [1] "ERROR 1: PROJ: proj_as_proj_string: Cannot find proj.db"
#> [2] ""                                                       
#> [3] "+proj=longlat +datum=WGS84 +no_defs"                    
#> [4] ""

outfile <- tempfile(fileext = ".gpkg")
reproj  <- system2("C:/OSGeo4W64/bin/ogr2ogr.exe",
                   args = c(outfile, in_file, "-t_srs EPSG:3035"),
                   stderr = TRUE, stdout = TRUE)
#> Warning in system2("C:/OSGeo4W64/bin/ogr2ogr.exe", args = c(outfile,
#> in_file, : running command '"C:/OSGeo4W64/bin/ogr2ogr.exe" C:
#> \Users\LB\AppData\Local\Temp\RtmpCWStMT\file14943c847c59.gpkg C:/Users/LB/
#> Documents/R/win-library/3.6/rgdal/vectors/cities.shp -t_srs EPSG:3035' had
#> status 1
reproj
#> [1] "ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db"
#> [2] "ERROR 1: Failed to process SRS definition: EPSG:3035"         
#> attr(,"status")
#> [1] 1
sf::st_read(outfile)
#> Error: Cannot open "C:\Users\LB\AppData\Local\Temp\RtmpCWStMT\file14943c847c59.gpkg"; 
#> The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.

The same problem currently prevents the use of gdalUtils functionality on Windows machines with GDAL3 installed. Infact, because gdalUtils imports rgdal, even starting from a “fresh” session we get this:

# At the beginning, PROJ_LIB is "correct" for GDAL3/PROJ6
Sys.getenv("PROJ_LIB")
#> [1] "C:\\OSGeo4W64\\share\\proj"

in_file <- system.file("vectors", "cities.shp", package = "rgdal")
outfile <- tempfile(fileext = ".gpkg")
reproj <- gdalUtils::ogr2ogr(in_file, outfile, t_srs = "EPSG:3035", verbose = TRUE)
#> Warning in system(cmd, intern = TRUE): running command '"C:
#>Checking gdal_installation...
#>Scanning for GDAL installations...
#>Checking the gdalUtils_gdalPath option...
#>GDAL version 3.0.2
#>GDAL command being used: "C:\OSGeo4W64\bin\ogr2ogr.exe" -t_srs "EPSG:3035," 
#>"C:\Users\LB\AppData\Local\Temp\RtmpCwTlHH\file7fc4c3660f9.gpkg" 
#>"C:/Users/LB/Documents/R/win-library/3.6/rgdal/vectors/cities.shp" 

#>Warning message:
#>In system(cmd, intern = TRUE) :
#>  running command '"C:\OSGeo4W64\bin\ogr2ogr.exe" -t_srs "EPSG:3035," 
#>"C:\Users\LB\AppData\Local\Temp\RtmpCwTlHH\file7fc4c3660f9.gpkg" 
#>"C:/Users/LB/Documents/R/win-library/3.6/rgdal/vectors/cities.shp" ' had status 1

reproj
#>[1] "ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db"
#>[2] "ERROR 1: Failed to process SRS definition: EPSG:3035"         
#>attr(,"status")
#>[1] 1

because, apparently, as soon as gdalUtils::ogr2ogr is called the PROJ_LIB variable is changed:

Sys.getenv("PROJ_LIB")
#> [1] "C:/Users/LB/Documents/R/win-library/3.6/rgdal/proj"

Now, although I agree with @rsbivand who strongly suggests to have only one “version” of GDAL libraries installed and used, in particular for unexperienced users, I still think that this behaviour is problematic, for two reasons:

  1. On windows, installing from OSGEO is the “way to go” to get GDAL, for both beginners and advanced users. Therefore, until the R binaries start relying on GDAL 3, there may be problems in “mixing” rgdal/gdalUtils/gdal binaries functionalities, and packages exploiting that “mix” could have unexpected breakage. Advanced users can still “downgrade” their GDAL system installation (although it is not ideal), but I fear that less experieced ones would struggle. Also, I know that packages authors could switch to using gdalUtilities or sf::gdal_utils to avoid the problem (I am already doing that within MODIStsp), but 1) unfortunately not all gdal binary utilities are available in gdalUtilities (see for example https://github.com/r-spatial/sf/issues/1213\#issuecomment-563977401), and 2) that would not solve the issue of packages currently relying on gdalUtils or system calls and unable/unwilling to do the switch.

  2. Even when the binaries will catch-up on GDAL3/PROJ6, I think that this kind of side effect should be avoided (and I think it is discouraged by CRAN policies, though I may be mistaken).

What is your take on this? I think a way to solve the problem could be to exploit Sys.setenv followed by on.exit calls in any function “requiring” PROJ_LIB to be properly set for rgdal/sf use, so that the environment variable can be automatically reset to its “former” value. Something like:

oldprojlib <- Sys.getenv("PROJ_LIB")
Sys.setenv(PROJ_LIB = "whatevere_needed_by_rgdal")
on.exit(Sys.setenv(PROJ_LIB = oldprojlib))

This seems however a bit cumbersome, because it would need to be replicated several times. Do you see any other way?

Hope this helps, and that I am not missing something obvious here.

Lorenzo

Created on 2019-12-12 by the reprex package (v0.3.0)

lbusett commented 4 years ago

@lovalery

I may be wrong , but I seem to remember that in the "old" RQGIS you needed to pass full paths to the input files, while here you are using a relative path to the input raster. Did you try that? Or even tried to pass a "R" raster object rather than a filename?

lovalery commented 4 years ago

@lbusett Thank you very much Lorenzo for these test proposals. I did them but unfortunately none of them succeeded. Please find them below. In any case, if you or anyone else have any other ideas, don't hesitate. I am available to perform all necessary tests.

> # 1- Setting of the environment and opening of the application (with the same double error message)
> set_env(root="C:/Program Files/QGIS 3.12")                    
$root
[1] "C:/Program Files/QGIS 3.12"

$qgis_prefix_path
[1] "C:/Program Files/QGIS 3.12/apps/qgis"

$python_plugins
[1] "C:/Program Files/QGIS 3.12/apps/qgis/python/plugins"

$platform
[1] "Windows"

> open_app()                                                  
proj_create_from_database: Cannot find proj.db
proj_create_from_database: Cannot find proj.db
> # 2- TEST 1 - Filling of the mandatory parameters for the function "polygonize" 
                with **absolute path** for the input file and **.shp** for the output file :

params<-get_args_man(alg="gdal:polygonize")
> params$INPUT<-"D:/test/pz5etr_bs_1500_0.9_0.1.tif"
> params$BAND<-1
> params$FIELD<-"DN"
> params$EIGHT_CONNECTEDNESS<-TRUE
> params$OUTPUT<-file.path("D:/test", "seg_poly2.shp")
> seg_poly <- run_qgis(alg = "gdal:polygonize",
+                 params = params,
+                 load_output = TRUE)

proj_create_from_wkt: Cannot find proj.db
proj_identify: Cannot find proj.db
$OUTPUT
[1] "D:/test/seg_poly2.shp"

Error in FUN(X[[i]], ...) : 
  Unfortunately, QGIS did not produce: D:/test/seg_poly2.shp
> # 3- TEST 2 - Filling of the mandatory parameters for the function "polygonize" 
                with **absolute path** for the input file and **.gpkg** for the output file :

params<-get_args_man(alg="gdal:polygonize")
> params$INPUT<-"D:/test/pz5etr_bs_1500_0.9_0.1.tif"
> params$BAND<-1
> params$FIELD<-"DN"
> params$EIGHT_CONNECTEDNESS<-TRUE
> params$OUTPUT<-file.path("D:/test", "seg_poly2.gpkg")
> seg_poly <- run_qgis(alg = "gdal:polygonize",
+                 params = params,
+                 load_output = TRUE)

$OUTPUT
[1] "D:/test/seg_poly2.gpkg"

Error in FUN(X[[i]], ...) : 
  Unfortunately, QGIS did not produce: D:/test/seg_poly2.gpkg
> # 4- TEST 3 -  Filling of the mandatory parameters for the function "polygonize" 
                with a **"R" raster object** and **.gpkg** for the output file :

># Creating the raster
> r<-raster("pz5etr_bs_1500_0.9_0.1.tif")
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on GRS80 ellipsoid in CRS definition
> r
class      : RasterLayer 
dimensions : 1773, 2834, 5024682  (nrow, ncol, ncell)
resolution : 0.0185259, 0.0185261  (x, y)
extent     : 216405.5, 216458, 6849399, 6849432  (xmin, xmax, ymin, ymax)
crs        : +proj=lcc +lat_0=46.5 +lon_0=3 +lat_1=49 +lat_2=44 +x_0=700000 +y_0=6600000 +ellps=GRS80 +units=m +no_defs 
source     : D:/test/pz5etr_bs_1500_0.9_0.1.tif 
names      : pz5etr_bs_1500_0.9_0.1 

> # Filling the mandatory parameters and running QGIS
> params$INPUT<-r
> params$BAND<-1
> params$FIELD<-"DN"
> params$EIGHT_CONNECTEDNESS<-TRUE
> params$OUTPUT<-file.path("D:/test", "seg_poly2.gpkg")
> seg_poly <- run_qgis(alg = "gdal:polygonize",
+                 params = params,
+                 load_output = TRUE)
$OUTPUT
[1] "D:/test/seg_poly2.gpkg"

Error in FUN(X[[i]], ...) : 
  Unfortunately, QGIS did not produce: D:/test/seg_poly2.gpkg
De plus : Warning messages:
1: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on GRS80 ellipsoid in CRS definition
2: In .gd_SetProject(object, ...) : NOT UPDATED FOR PROJ >= 6
3: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on GRS80 ellipsoid in CRS definition

> # 5- TEST 4 -  Filling of the mandatory parameters for the function "polygonize" 
                with a "**R" raster object** and **.shp** for the output file :

> # Filling the mandatory parameters and running QGIS
> params<-get_args_man(alg="gdal:polygonize")
> params$INPUT<-r
> params$BAND<-1
> params$FIELD<-"DN"
> params$EIGHT_CONNECTEDNESS<-TRUE
> params$OUTPUT<-file.path("D:/test", "seg_poly2.shp")
> seg_poly <- run_qgis(alg = "gdal:polygonize",
+                 params = params,
+                 load_output = TRUE)

$OUTPUT
[1] "D:/test/seg_poly2.shp"

Error in FUN(X[[i]], ...) : 
  Unfortunately, QGIS did not produce: D:/test/seg_poly2.shp
De plus : Warning messages:
1: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on GRS80 ellipsoid in CRS definition
2: In .gd_SetProject(object, ...) : NOT UPDATED FOR PROJ >= 6
3: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on GRS80 ellipsoid in CRS definition