Closed dblodgett-usgs closed 1 year ago
For build 5286-5107 of Rtools42 (and subsequent), could those testing please note that sf
src/Makefile.ucrt
needs updating to add in-lkea -lhdf5_hl -hdf5_cpp
where-lhdf5_hl
is now. In response to a user request, the KEA driver was added in this build of Rtools42. The same applies to terra, etc. rtools42-5253-5107-signed.exe is released Rtools42 matching thesrc/Makefile.ucrt
files currently present in the source packages. @kalibera is there a way of conditioning on the Rtools build number insrc/Makefile.ucrt
- if so, do you have an example of a package using version conditioning?
Yes, the UCRT3 builds (using R-devel and unreleased version of Rtools42) automatically apply patches, which also adjust the linking, but if one tests using R 4.2. and unreleased Rtools42, one has to apply the patch manually. The patches used in the last build are available here: https://www.r-project.org/nosvn/winutf8/ucrt3/patches/CRAN/
The patches change the linking unconditionally. There is no prescribed way of conditioning as far as I know and it is a rare problem (4 packages currently have patches changing linking). Perhaps you could check for presence of files (headers, libraries), a bit like configure does, so e.g. check for kea. A new major release of R might require bigger changes, but then one could in principle conditionalize on R version. An installation of Rtools42 (from installer and toolchain bundle) has meta-data in files (.version, also list of MXE packages), but I would myself choose rather checks for presence of files as they are more general and portable. If anyone still had preference for the meta-data files, I think the right way would be to first discuss with the CRAN repository maintainers.
Ok, @kalibera , hot patching makes good sense, thanks.
Here are binary builds of terra and sf with the 5336 libgdal - one 7z archive for each of R-devel and R-4.2.1: https://drive.google.com/drive/folders/1aG5XsjkPAO35aUWLSImlB8_eWg47Gxno?usp=sharing. Please do not trust these too much, they pass R CMD check, but will not be identical with those from the actual build system. The package source code was manually patched for the reinstated KEA driver.
Gdrive folder updated with copies of the source tarballs and fresh builds of the binaries from these tarballs, about Sept. 15, 10:30 CEST.
Thanks, I am now working on a bigger update of Rtools42, which will include more than this work-around, so the packages built on my system will get overwritten with even newer ones. So your builds may be good for reference if anyone needed to test only this change and nothing more.
https://github.com/OSGeo/gdal/pull/6359 removes the use of std::regex in the OGR SQLite&GPKG drivers
@rouault: thanks! As I read the commit, was the previous code looking in the field string rather than the table definition for the unique
constraint declaration? A pity we didn't hit this issue before 3.5.2 was released.
@rsbivand -- sorry to be dense, how should I be installing these binary builds? I'm getting errors with devtools::install(... quick=TRUE)
in the field string rather than the table definition for the
unique
constraint declaration?
The previous version looked at fieldStr, which contained each column definition, extracted from tableDefinition, split on comma. Anyway the new version should hopefully be at least as good, and probably better, than the previous one, as I assume the previous one might have had issues if comma were found in column name or DEFAULT 'some default value, with comma' clauses.
@dblodgett-usgs install.packages("sf_1.0-9.zip")
🎉 You guys are 🧙 level.
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.5.0, PROJ 7.2.1; sf_use_s2() is TRUE
nc <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
names(nc)[4] <- "cnty_unique_id"
write_sf(nc, "nc_test.gpkg")
read_sf("nc_test.gpkg")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ cnty_u…¹ NAME FIPS FIPSNO CRESS…² BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # … with 90 more rows, 4 more variables: BIR79 <dbl>, SID79 <dbl>,
#> # NWBIR79 <dbl>, geom <MULTIPOLYGON [°]>, and abbreviated variable names
#> # ¹cnty_unique_id, ²CRESS_ID
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_1.0-9
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9.1 compiler_4.2.1 pillar_1.8.1 highr_0.9
#> [5] class_7.3-20 tools_4.2.1 digest_0.6.29 evaluate_0.16
#> [9] lifecycle_1.0.2 tibble_3.1.8 pkgconfig_2.0.3 rlang_1.0.5
#> [13] reprex_2.0.2 DBI_1.1.3 cli_3.4.0 rstudioapi_0.14
#> [17] yaml_2.3.5 xfun_0.32 fastmap_1.1.0 e1071_1.7-11
#> [21] withr_2.5.0 stringr_1.4.1 dplyr_1.0.10 knitr_1.40
#> [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.1 tidyselect_1.1.2
#> [29] classInt_0.4-7 grid_4.2.1 glue_1.6.2 R6_2.5.1
#> [33] fansi_1.0.3 rmarkdown_2.16 purrr_0.3.4 magrittr_2.0.3
#> [37] htmltools_0.5.3 units_0.8-0 KernSmooth_2.23-20 utf8_1.2.2
#> [41] stringi_1.7.8 proxy_0.4-27
sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
#> "3.9.1" "3.5.0" "7.2.1" "true" "true"
#> PROJ
#> "7.2.1"
Created on 2022-09-14 with reprex v2.0.2
Thanks! I'll update the 7z's tomorrow with sources (done 15/9 10:30 CEST). Actually the issue has helped improve GDAL too, so a "unique" contribution.
To catch things like this, sooner, on Windows, one would have to run tests in UTF-8, for which one would have to use UCRT and a UCRT toolchain, and one would have to set UTF-8 to be the native encoding via the fusion manifest (at build time via the manifest file, as R does it). It could be done using R/R packages and Rtools42 (MXE with some updates). There are number of patches that are now being done downstream to make gdal work (https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/gdal-1-fixes.patch) and as always, it would be nice if the upstream code already could be built and used with UCRT without such downstream patching.
So, it would be nice if the upstream MXE could have up to date versions of gdal etc (not having to patch downstream in Rtools42) and if upstream gdal etc could work with UCRT/UTF-8 (not having to patch downstream in Rtools42). That would increase the chances problems are found in time and solved properly.
@kalibera This recalls https://blog.r-project.org/2019/03/28/use-of-c-in-packages/index.html, but here the problem has been rushing forward without awareness that saying std::
doesn't mean problem solved for ever across changing platforms. Now we probably need to screen both packages and external software for use of std::regex
not only for UCRT but because it may be withdrawn in the future for all platforms; @eddelbuettel, might Rcpp screen relevant packages? For external software, we have Windows and macOS binary builds with our own builds of libraries, but otherwise we'll have to rely on the upstream packagers or library maintainers. Here the rapid response by @rouault is a great example.
My blog was primarily against using C++ to interface with R, not against C++ in principle. If a library independent on R internally uses C++, without ever calling back to R (or allocating from R heap, throwing exceptions/jumping around R stack frames), that's ok. Yes, checking C++ in R packages for potential use of std::regex would be useful.
Yes, checking C++ in R packages for potential use of std::regex would be useful.
So let R CMD check
do it. Rcpp
has nothing to do with any of this, neither mandates nor promotes any C++ components. And of course only uses .Call()
, a C interface, to communicate with R.
So CRAN and Bioc teams would be the ones to check.
Here is another GDAL issue that appears to be related to the R toolchain on Windows. In short, this HDF5 file can be read on linux but not windows. This problem happens with "terra" and also with "stars".
LINUX
x <- rast("hd2011010000.scu")
#Warning message:
#[rast] unknown extent
y <- x * 1
y
#class : SpatRaster
#dimensions : 536, 536, 1 (nrow, ncol, nlyr)
#resolution : 1, 1 (x, y)
#extent : 0, 536, 0, 536 (xmin, xmax, ymin, ymax)
#coord. ref. :
#source : memory
#name : hd2011010000
#min value : -999
#max value : 15
WINDOWS
x <- rast("hd2011010000.scu")
#Warning message:
#[rast] unknown extent
y <- x * 1
#Warning messages:
#1: H5Dread() failed for block. (GDAL error 1)
#2: HDF5:"hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X offset 0, Y offset 0:
# H5Dread() failed for block. (GDAL error 1)
y
#class : SpatRaster
#dimensions : 536, 536, 1 (nrow, ncol, nlyr)
#resolution : 1, 1 (x, y)
#extent : 0, 536, 0, 536 (xmin, xmax, ymin, ymax)
#coord. ref. :
The second set of warnings should really be an Error because y
has no values. and that is what "stars" give you:
s <- stars::read_stars("hd2011010000.scu")
#Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
# read failure
#In addition: Warning messages:
#1: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
# GDAL Error 1: H5Dread() failed for block.
#2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
# GDAL Error 1: HDF5:"hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X #offset 0, Y offset 0: H5Dread() failed for block.
@rhijmans which versions of the library used by the driver are being used? The windows version will be that of MXE in the ucrt3 builds, presumably released Rtools42, not my test build with the unreleased version of libgdal, which version in your linux, and how was thst gdal built and installed?
Here is another GDAL issue that appears to be related to the R toolchain on Windows.
Please note that std::regex is unreliable also on other platforms (seen on macOS at least with R packages as I've learned now). It is not specific to Windows (nor to Rtools42).
@rsbivand: This fails on windows with the old (in R 3.6, GDAL 3.2.1) and new (UCRT) RTools using GDAL 3.4.3. It works for me on Ubuntu or Debian with GDAL 2.2.3, 2.4.0, and 3.4.0, and you showed that it works in Fedora with GDAL 3.5.2. (and you refer to the HDF lib versions as well, which may be helpful).
@kalibera: point well taken that the std::regex problems are not a Rtools issue. But this seems to be caused by how the HDF5 lib is built for R on windows.
@rouault solved a (perhaps) similar issue with HDF and OSGeo4W
@rhijmans From reading the OSGeo4W issue, it may be that there is something odd happening when HDF5 is static rather than dynamic too. It would be interesting to see whether the error seen in R on Windows binary packages can be reproduced without R and the packages using MXE-built GDAL apps. These are not distributed, so someone will have to run a custom cross-compilation. Can you establish whether there is a useful HDF5 test suite on Windows before the HDF5 libraries are used when GDAL is built? Are there issues raised about Windows builds for HDF5 itself?
This thread may be hard to follow when this is no longer about the std::regex.
Still, I think the problem with HDF5 is that the szip is not supported (it is not part of MXE), but the file uses it. The output with the current (unreleased yet) UCRT3 Rtools42 build is:
> library(terra)
terra 1.6.17
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent
> y <- x * 1
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
#000: ../../src/H5Dio.c line 192 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: ../../src/H5VLcallback.c line 2080 in H5VL_dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#002: ../../src/H5VLcallback.c line 2046 in H5VL__dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#003: ../../src/H5VLnative_dataset.c line 167 in H5VL__native_dataset_read(): can't read data
major: Dataset
minor: Read failed
#004: ../../src/H5Dio.c line 567 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#005: ../../src/H5Dchunk.c line 2594 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#006: ../../src/H5Dchunk.c line 3957 in H5D__chunk_lock(): data pipeline read failed
major: Dataset
minor: Filter operation failed
#007: ../../src/H5Z.c line 1311 in H5Z_pipeline(): required filter 'szip' is not registered
major: Data filters
minor: Read failed
#008: ../../src/H5PLint.c line 274 in H5PL_load(): search in path table failed
major: Plugin for dynamically loaded library
minor: Can't get value
#009: ../../src/H5PLpath.c line 604 in H5PL__find_plugin_in_path_table(): search in path C:\Progra
mData\hdf5\lib\plugin encountered an error
major: Plugin for dynamically loaded library
minor: Can't get value
#010: ../../src/H5PLpath.c line 734 in H5PL__find_plugin_in_path(): can't open directory
major: Plugin for dynamically loaded library
minor: Can't open directory or file
Warning messages:
1: H5Dread() failed for block. (GDAL error 1)
2: HDF5:"C:/msys64/home/tomas/hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X
offset 0, Y offset 0: H5Dread() failed for block. (GDAL error 1)
and szip is indeed disabled in the HDF5 build.
@kalibera Thanks! I followed up on https://github.com/rspatial/terra/issues/686 as the initial terra issue.
When I try to sf::st_layers() on the attached gpkg, rstudio bombs. When I list from a terminal, I see:
Reproduce with:
nav_06.zip