Closed dblodgett-usgs closed 1 year ago
Works without problems here, also in rstudio (ubuntu 20.04, RStudio 2022.07.1+554), also no issues when run under valgrind. What is your sessionInfo()
after loading sf
?
With Windows 10, R 4.2.1 RGui.exe and R.exe:
> sf::st_layers("nav_06.gpkg")
Driver: GPKG
Available layers:
layer_name geometry_type features fields crs_name
1 unassigned_gages Point 6 16 NAD83 / Conus Albers
2 split_events Point 91 6 NAD83 / Conus Albers
3 POIs_tmp_06 Point 2201 12 NAD83 / Conus Albers
With freshly installed RStudio-2022.07.1-554.exe no problems. There were recent reports of odd errors in packages installed from the Rstudio mirror - try reinstalling sf outside Rstudio from another mirror?
I've been double checking all my installs. This is straight from an Rterm.exe under Cmder. Now sst_layers() just hangs. This is after a devtools::install_github("r-spatial/sf")
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.1
> sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
"3.9.1" "3.4.3" "7.2.1" "true" "true"
PROJ
"7.2.1"
> sf::st_layers("nav_06.gpkg")
Does released sf have the same problem?
Yes. The output of sf::st_layers()
below is just hung and unresponsive.
> library(sf)
Error in library(sf) : there is no package called 'sf'
> install.packages("sf")
Installing package into '---/R/win-library/4.2'
(as 'lib' is unspecified)
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/sf_1.0-8.zip'
Content type 'application/zip' length 25291244 bytes (24.1 MB)
downloaded 24.1 MB
package 'sf' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
---\AppData\Local\Temp\1\Rtmpa2wXeQ\downloaded_packages
> library(sf)
Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sf_1.0-8
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 magrittr_2.0.3 units_0.8-0 tidyselect_1.1.2
[5] R6_2.5.1 rlang_1.0.4 fansi_1.0.3 dplyr_1.0.9
[9] tools_4.2.1 grid_4.2.1 KernSmooth_2.23-20 utf8_1.2.2
[13] cli_3.3.0 e1071_1.7-11 DBI_1.1.3 class_7.3-20
[17] assertthat_0.2.1 tibble_3.1.8 lifecycle_1.0.1 purrr_0.3.4
[21] vctrs_0.4.1 glue_1.6.2 proxy_0.4-27 compiler_4.2.1
[25] pillar_1.8.1 generics_0.1.3 classInt_0.4-7 pkgconfig_2.0.3
> sf::st_layers("nav_06.gpkg")
Long shot, re-install Rcpp from a CRAN mirror?
nope. I guess now's as good a time as any to reinstall all the things.
I've now completely uninstalled R and all my packages and upgraded rstudio. If I run this in a terminal, the R for Windows terminal front-end process hammers a CPU and cycles between ~500mb of memory used and 3500mb.
I'm going to step through the code that creates this and try and find what the hang up is.
Alright -- I've narrowed this down to its source. The issue is an attribute "uniqueID" that is causing the gpkg reader to hang. If I uncomment the last line, reprex hangs. If I run this in rstudio, it bombs rstudio.
While it is hung, the R for Windows process is hammering a CPU and eating between 200 and 4000MB of memory with this sawtooth pattern. Happy to do more debugging or dump some other logs if someone knows where to go looking for them.
I guess I'll stop using this "uniqueID" attribute for now. 😝
borked <- sf::read_sf('
{
"type": "FeatureCollection",
"name": "borked",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::5070"
}
},
"features": [
{
"type": "Feature",
"properties": {
"COMID": 19736669,
"REACHCODE": "06010202003357",
"REACH_meas": 22.451321717399999,
"uniqueID": "03507000"
},
"geometry": {
"type": "Point",
"coordinates": [
1117044.368864378193393,
1445765.030222098575905
]
}
},
{
"type": "Feature",
"properties": {
"COMID": 19677981,
"REACHCODE": "06020002000118",
"REACH_meas": 61.4738,
"uniqueID": "03554000"
},
"geometry": {
"type": "Point",
"coordinates": [
1072594.611272452631965,
1396990.093436731025577
]
}
}
]
}
')
sf::write_sf(borked, "borked.gpkg")
sf::write_sf(dplyr::rename(borked, ID = uniqueID), "works.gpkg")
sf::read_sf("works.gpkg")
#> Simple feature collection with 2 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1072595 ymin: 1396990 xmax: 1117044 ymax: 1445765
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 2 × 5
#> COMID REACHCODE REACH_meas ID geom
#> <int> <chr> <dbl> <chr> <POINT [m]>
#> 1 19736669 06010202003357 22.5 03507000 (1117044 1445765)
#> 2 19677981 06020002000118 61.5 03554000 (1072595 1396990)
# sf::read_sf("borked.gpkg")
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 compiler_4.2.1 pillar_1.8.1 highr_0.9
#> [5] class_7.3-20 tools_4.2.1 digest_0.6.29 evaluate_0.16
#> [9] lifecycle_1.0.2 tibble_3.1.8 pkgconfig_2.0.3 rlang_1.0.5
#> [13] reprex_2.0.2 DBI_1.1.3 cli_3.4.0 rstudioapi_0.14
#> [17] yaml_2.3.5 xfun_0.32 fastmap_1.1.0 e1071_1.7-11
#> [21] withr_2.5.0 stringr_1.4.1 dplyr_1.0.10 knitr_1.40
#> [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.1 tidyselect_1.1.2
#> [29] classInt_0.4-7 grid_4.2.1 glue_1.6.2 sf_1.0-8
#> [33] R6_2.5.1 fansi_1.0.3 rmarkdown_2.16 purrr_0.3.4
#> [37] magrittr_2.0.3 ellipsis_0.3.2 htmltools_0.5.3 units_0.8-0
#> [41] KernSmooth_2.23-20 utf8_1.2.2 stringi_1.7.8 proxy_0.4-27
sf::sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
#> "3.9.1" "3.4.3" "7.2.1" "true" "true"
#> PROJ
#> "7.2.1"
Created on 2022-09-09 with reprex v2.0.2
After some further testing, it's actually any attribute with "unique" in the name.
e.g. this causes it.
nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
nc <- dplyr::rename(nc, "cnty_unique_id" = CNTY_ID)
sf::write_sf(nc, "nc_test.gpkg")
sf::read_sf("nc_test.gpkg")
Very odd (Windows 10)
> nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
> nc <- dplyr::rename(nc, "cnty_unique_id" = CNTY_ID)
> sf::write_sf(nc, "nc_test.gpkg")
> sf::read_sf("nc_test.gpkg")
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
# A tibble: 100 × 15
AREA PERIMETER CNTY_ cnty_u…¹ NAME FIPS FIPSNO CRESS…² BIR74 SID74 NWBIR74
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
# … with 90 more rows, 4 more variables: BIR79 <dbl>, SID79 <dbl>,
# NWBIR79 <dbl>, geom <MULTIPOLYGON [°]>, and abbreviated variable names
# ¹cnty_unique_id, ²CRESS_ID
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
> names(sf::read_sf("nc_test.gpkg"))
[1] "AREA" "PERIMETER" "CNTY_" "cnty_unique_id"
[5] "NAME" "FIPS" "FIPSNO" "CRESS_ID"
[9] "BIR74" "SID74" "NWBIR74" "BIR79"
[13] "SID79" "NWBIR79" "geom"
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 magrittr_2.0.3 units_0.8-0 tidyselect_1.1.2
[5] R6_2.5.1 rlang_1.0.5 fansi_1.0.3 dplyr_1.0.10
[9] tools_4.2.1 grid_4.2.1 KernSmooth_2.23-20 utf8_1.2.2
[13] cli_3.3.0 e1071_1.7-11 DBI_1.1.3 ellipsis_0.3.2
[17] class_7.3-20 tibble_3.1.8 lifecycle_1.0.2 sf_1.0-8
[21] purrr_0.3.4 vctrs_0.4.1 glue_1.6.2 proxy_0.4-27
[25] compiler_4.2.1 pillar_1.8.1 generics_0.1.3 classInt_0.4-7
[29] pkgconfig_2.0.3
Maybe also try terra:
> terra::vect("nc_test.gpkg")
class : SpatVector
geometry : polygons
dimensions : 100, 14 (geometries, attributes)
extent : -84.32385, -75.45698, 33.88199, 36.58965 (xmin, xmax, ymin, ymax)
source : nc_test.gpkg
coord. ref. : lon/lat NAD27 (EPSG:4267)
names : AREA PERIMETER CNTY_ cnty_unique_id NAME FIPS FIPSNO
type : <num> <num> <num> <num> <chr> <chr> <num>
values : 0.114 1.442 1825 1825 Ashe 37009 3.701e+04
0.061 1.231 1827 1827 Alleghany 37005 3.7e+04
0.143 1.63 1828 1828 Surry 37171 3.717e+04
CRESS_ID BIR74 SID74 (and 4 more)
<int> <num> <num>
5 1091 1
3 487 0
86 3188 5
which uses the same GDAL binary library.
And:
> sf::read_sf("works.gpkg")
Simple feature collection with 2 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 1072595 ymin: 1396990 xmax: 1117044 ymax: 1445765
Projected CRS: NAD83 / Conus Albers
# A tibble: 2 × 5
COMID REACHCODE REACH_meas ID geom
<int> <chr> <dbl> <chr> <POINT [m]>
1 19736669 06010202003357 22.5 03507000 (1117044 1445765)
2 19677981 06020002000118 61.5 03554000 (1072595 1396990)
> sf::read_sf("borked.gpkg")
Simple feature collection with 2 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 1072595 ymin: 1396990 xmax: 1117044 ymax: 1445765
Projected CRS: NAD83 / Conus Albers
# A tibble: 2 × 5
COMID REACHCODE REACH_meas uniqueID geom
<int> <chr> <dbl> <chr> <POINT [m]>
1 19736669 06010202003357 22.5 03507000 (1117044 1445765)
2 19677981 06020002000118 61.5 03554000 (1072595 1396990)
terra::vect
has the same behavior. So something in here?
> sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.9.1" "3.4.3" "7.2.1" "true" "true" "7.2.1"
terra
copied some C++ code from sf
for reading/writing OGR data, so it might be in there, or it is in GDAL (but I'm running GDAL 3.4.3 too) or in libsqlite3 (which GDAL uses to write/read GPKG), or in the windows build train (new in R 4.2: ucrt). I really have no clue - the weird thing is that rstudio triggers it, regular R doesn't, and only on windows. Can you reproduce it in rstudio on Windows, @rsbivand ?
No, nothing with a freshly installed rstudio with the original reprex. My Windows 10 version is the same too. Is any package not a CRAN binary?
No... I reinstalled R, Rstudio, and my entire library with devtools yesterday. I'll do a little more snooping here.
Do you know which mirror was used?
The default cloud one. For the record, I have installed sf
with a native build (with rstudio package build tools) and it's still happening. Trying to get into my windows crash dumps to see if anything is emitted of use.
By native build, do you mean the released Rtools42? Please simplify by avoiding any Rstudio packages in any source builds, use only the very simplest route. My installs for which no errors occur are just standard CRAN binary installs, though I have Rtools42, and even (not now) test versions of MXE-built libgdal.a testing updated drivers. I have a feeling that somewhere you have a component built with a non-standard build chain.
@edzer : borked.gpkg and nc_test.gpkg OK also in rstudio.
OK -- I think this might be something? This is generated by looking at the crash dump in WinDbg.
See the lines with stuff like the following out in the middle of the stack:
sf!ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_St9_IdentityIS5_ESt4lessIS5_ESaIS5_EE16_M_insert_uniqueIS5_EESt4pairISt17_Rb_tree_iteratorIS5_EbEOT_+0x6a9d4
From sf:
From terra:
By native build, do you mean the released Rtools42? Please simplify by avoiding any Rstudio packages in any source builds, use only the very simplest route. My installs for which no errors occur are just standard CRAN binary installs, though I have Rtools42, and even (not now) test versions of MXE-built libgdal.a testing updated drivers. I have a feeling that somewhere you have a component built with a non-standard build chain.
@rsbivand I mean that I installed sf
by building the package with devtools::build() via Rstudio's build tools. I do have Rtools42 although I have to admit ignorance of all the nuances of the build infrastructure for compiled packages in Windows.
I ran this code from @dblodgett-usgs and I also can confirm that on Windows 10 RStudio is crashing and R Terminal is hanging (for {terra}
is the same). However, on Windows 8.1 it works OK. I have {sf}
binary version from CRAN. Maybe different locale is the problem?
Great; I'm looking at @dblodgett-usgs stack trace for sf, and am completely puzzled why CPL_get_z_range
seems to get called from within sf_from_ogrlayer
. There's no reason for it, but there isn't even a call to CPL_get_z_range
in that function. Should I understand STACK_TEXT
as a stack trace or is it something else?
@edzer -- for the record, the sf
STACK_TEXT came from a WinDbg DUMP after running the following while nc_test.gpkg
does not exist -- it writes but bombs during read:
nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
nc <- dplyr::rename(nc, "cnty_unique_id" = CNTY_ID)
sf::write_sf(nc, "nc_test.gpkg")
sf::read_sf("nc_test.gpkg")
The full report is:
Progress - after re-installing sf binary from CRAN cloud mirror, I'm also seeing the sf::read_sf("nc_test.gpkg")
hang. Microsoft Windows [Version 10.0.19044.1889], this morning terra too.
Edit:
Not stable progress - after switching to st_()
and omitting dplyr, I re-ran the new and original code in a for-loop, and no hangs were observed. So is this a transient case on first load of sf ??? Not always, Now I cannot re-create the hang even in a new R session (R.exe
in Windows terminal console).
On the Windows PC on which sf::read_sf("nc_test.gpkg")
just failed:
> nc <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"))
Reading layer `nc.gpkg' from data source
`C:\Users\RB\AppData\Local\R\win-library\4.2\sf\gpkg\nc.gpkg'
using driver `GPKG'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
> names(nc)
[1] "AREA" "PERIMETER" "CNTY_" "CNTY_ID" "NAME" "FIPS"
[7] "FIPSNO" "CRESS_ID" "BIR74" "SID74" "NWBIR74" "BIR79"
[13] "SID79" "NWBIR79" "geom"
> names(nc)[4] <- "cnty_unique_id"
> names(nc)
[1] "AREA" "PERIMETER" "CNTY_" "cnty_unique_id"
[5] "NAME" "FIPS" "FIPSNO" "CRESS_ID"
[9] "BIR74" "SID74" "NWBIR74" "BIR79"
[13] "SID79" "NWBIR79" "geom"
> sf::st_write(nc, "nc_test0.gpkg")
Writing layer `nc_test0' to data source `nc_test0.gpkg' using driver `GPKG'
Writing 100 features with 14 fields and geometry type Multi Polygon.
> sf::st_read("nc_test0.gpkg")
Reading layer `nc_test0' from data source `C:\Users\RB\work\nc_test0.gpkg' using driver `GPKG'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
>
that is:
nc <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"))
names(nc)[4] <- "cnty_unique_id"
sf::st_write(nc, "nc_test0.gpkg")
sf::st_read("nc_test0.gpkg")
passes, as does terra::vect("nc_test0.gpkg")
.
@rhijmans maybe you could take a look what is going on here, being familiar with windows debugging?
@rsbivand For the record, your st_read version also bombs my rstudio.
I see the same problem (R hanging). This happens when trying to read geometries or attributes or even the srs from the "poLayer" (obtained with, e.g. "poLayer = poDS->GetLayer(0)"). That makes it difficult to further debug this for me.
For example, the below fails with the offending file
bool SpatVector::test(std::string filename) {
GDALDataset *poDS = static_cast<GDALDataset*>(GDALOpenEx(filename.c_str(), GDAL_OF_VECTOR, NULL, NULL, NULL ));
OGRSpatialReference *poSRS = poDS->GetLayer(0)->GetSpatialRef();
return true;
}
You can call it like this with terra-devel
library(sf)
library(terra)
fin <- system.file("gpkg/nc.gpkg", package = "sf")
fout <- "nc_test0.gpkg"
if (!file.exists(fout)) {
nc <- sf::st_read(fin)
names(nc)[4] <- "cnty_unique_id"
sf::st_write(nc, fout)
}
v <- terra::vect()@ptr
v$test(fin)
[1] TRUE
# this hangs
v$test(fout)
@rouault did you come across a problem with GeoPackage file reading when one of the field contains the string unique
?
did you come across a problem with GeoPackage file reading when one of the field contains the string
unique
?
I suspect you might hit https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp#L351 . Do you use a outdated compiler (GCC < 5) to build GDAL ? There are known issues in the early implementations of C++ regexp capabilities
As far as I can see, Rtools42 (the toolchain used to build R and R packages for windows) uses
gcc --version
#gcc (GCC) 10.3.0
can you try just with ogrinfo / ogr2ogr ? Maybe it is an issue with the mingw implementation of C++ regexp, but that would be strange as GDAL has a msys2 CI setup where testing of tables with unique in column name is done (https://github.com/OSGeo/gdal/blob/4de8becdd4ff4d656c3d7f8ab2cf3baa2a5005b9/autotest/ogr/ogr_gpkg.py#L2275) and it passes fine on it
@rhijmans or someone else with an R / windows machine: you could try ogr2ogr
in the same R setup e.g. with sf::gdal_utils("vectortranslate", "nc_test0.gpkg", "nc_test0.json")
; ogrinfo
is not available through gdal_utils
.
@rhijmans or someone else with an R / windows machine: you could try
ogr2ogr
in the same R setup e.g. withsf::gdal_utils("vectortranslate", "nc_test0.gpkg", "nc_test0.json")
;ogrinfo
is not available throughgdal_utils
.
It crashed for me in RStudio.
I can confirm that the regex works (code here)
v$test(fin)
#regex works
#[1] TRUE
ogr2ogr via sf also hangs for me
The code at: https://github.com/OSGeo/gdal/blob/8e4c84b5103a832f5f0ca507ce2eec36f479f994/ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp#L338-L445 applies to SQLite UNIQUE contraints, but the test appears to try to handle columns/fields with toupper("unique")
in the column name as unique. If in addition std::regex
is buggy but is marked as OK - the test for a working std::regex
is very simple - odd behaviour may follow. The specific commit seems to be: https://github.com/OSGeo/gdal/commit/455bf3c255ee78dc70116ca2e2b1a37aaecd0799#diff-4427c5e19b00d7ce062221d9d671df58c1f622d1689375ced2a6cc61dbe8c3c4. Unique constraints were added to some vector drivers from 3.2.0.
I can reproduce the problem in the trunk build of Rtools42 ("ucrt3"), stacktrace pointing to regex compilation in SQLGetUniqueFieldUCConstraints. I will try rebuilding Rtools42 with gdal modified to assert that the regexes are buggy (always) in this function. It would still be good to know if this is a bug in the std::regex implementation (and report if so), or in gdal.
The only uses of regex
in GDAL master:
$ grep regex */*/*/*.cpp
ogr/ogrsf_frmts/hana/ogrhanatablelayer.cpp:#include <regex>
ogr/ogrsf_frmts/hana/ogrhanatablelayer.cpp: const auto regex = std::regex(R"((\w+)+\((\d+(,\d+)*)\)$)");
ogr/ogrsf_frmts/hana/ogrhanatablelayer.cpp: std::regex_search(typeDef, match, regex);
ogr/ogrsf_frmts/sqlite/ogrsqliteregexp.cpp:#include "ogrsqliteregexp.h"
ogr/ogrsf_frmts/sqlite/ogrsqliteregexp.cpp: sqlite3_result_error(ctx, "no regexp", -1);
ogr/ogrsf_frmts/sqlite/ogrsqliteregexp.cpp: sqlite3_result_error(ctx, "no regexp", -1);
ogr/ogrsf_frmts/sqlite/ogrsqlitesqlfunctions.cpp:#include "ogrsqliteregexp.cpp" /* yes the .cpp file, to make it work on Windows with load_extension('gdalXX.dll') */
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp:#include <regex>
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: // std::regex in gcc < 4.9 is broken
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: static int hasWorkingRegex = std::regex_match("c", std::regex("a|b|c"));
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: static const std::regex sFieldIdentifierRe {
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: std::regex_constants::icase};
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: if( std::regex_search(fieldStr, uniqueFieldMatch, sFieldIdentifierRe) )
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: static const std::regex sFieldIndexIdentifierRe {
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: if( std::regex_search(indexDefinition, uniqueFieldMatch,
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: catch( const std::regex_error& e )
ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp: CPLError(CE_Failure, CPLE_AppDefined, "regex_error: %s", e.what());
$ grep regex */*/*.cpp
frmts/rasdaman/rasdamandataset.cpp:#include "regex.h"
frmts/rasdaman/rasdamandataset.cpp: // regex for parsing options
frmts/rasdaman/rasdamandataset.cpp: regex_t optionRegEx;
frmts/rasdaman/rasdamandataset.cpp: // regex for parsing query
frmts/rasdaman/rasdamandataset.cpp: regex_t queryRegEx;
frmts/rasdaman/rasdamandataset.cpp: CPLError(CE_Failure, CPLE_AppDefined, "Internal error at compiling option parsing regex: %s", errbuffer);
frmts/rasdaman/rasdamandataset.cpp: CPLError(CE_Failure, CPLE_AppDefined, "Internal error at compiling option parsing regex: %s", errbuffer);
frmts/rasdaman/rasdamandataset.cpp: // executing option parsing regex on the connection string and checking if it succeeds
frmts/rasdaman/rasdamandataset.cpp: result = regexec(&optionRegEx, connString, 10, matches, 0);
frmts/rasdaman/rasdamandataset.cpp: result = regexec(&queryRegEx, queryParam, 10, matches, 0);
frmts/vrt/vrtderivedrasterband.cpp: "fromregex", // numpy.fromregex
With patched gdal (to assert std::regex is buggy in SQLGetUniqueFieldUCConstraints), Roger's testcase finishes for me (an a system where it got stuck before):
> nc <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"))
Reading layer `nc.gpkg' from data source
`C:\msys64\home\tomas\trunk\library\sf\gpkg\nc.gpkg' using driver `GPKG'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
> names(nc)[4] <- "cnty_unique_id"
> sf::st_write(nc, "nc_test0.gpkg")
Writing layer `nc_test0' to data source `nc_test0.gpkg' using driver `GPKG'
Writing 100 features with 14 fields and geometry type Multi Polygon.
> sf::st_read("nc_test0.gpkg")
Reading layer `nc_test0' from data source
`C:\msys64\home\tomas\trunk\src\gnuwin32\nc_test0.gpkg' using driver `GPKG'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
This new build of Rtools42 is now at https://www.r-project.org/nosvn/winutf8/ucrt3 (file gcc10_ucrt3_full_5336.tar.zst, svn revision 5336). I took this opportunity to re-generate the patch for gdal-3.5, but the only real change is:
diff -Nru gdal-3.5.0-orig/ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp gdal-3.5.0-patched/ogr/ogrsf_fr
mts/sqlite/ogrsqliteutility.cpp
--- gdal-3.5.0-orig/ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp 2022-05-10 10:03:38.000000000 -0400
+++ gdal-3.5.0-patched/ogr/ogrsf_frmts/sqlite/ogrsqliteutility.cpp 2022-09-13 14:35:48.582070744
-0400
@@ -349,7 +349,8 @@
std::set<std::string> uniqueFieldsUC;
// std::regex in gcc < 4.9 is broken
-#if !defined(__GNUC__) || defined(__clang__) || __GNUC__ >= 5
+ // seems still broken in newer versions, including gcc 10.3
+#if 0
try
{
Thanks @kalibera , very useful! Could you perhaps please provide links to updated sf and terra binaries for @dblodgett-usgs and @kadyb and others? This would enlarge the pool of testers.
Is perhaps the MXE/Msys2 regex interacting with UCRT? The GDAL Msys2 CI is likely to be using legacy CRT, although I cannot see where that choice would be made in https://github.com/OSGeo/gdal/tree/master/.github/workflows/mingw_w64. So this R/MXE/UCRT detection of a string handling problem might also be attacked by re-checking the std::regex
component, provided I think by libsystre
. I see that a ucrt variant is available, but maybe not tested; the underlying version 1.0.1 seems to be 8 years old, so pre-UCRT.
The Rasdaman GDAL raster driver isn't built in MXE libgdal, it needs raslib
which nobody has asked for. The HANA vector driver needs odbc-cpp-wrapper
, but again nobody has asked for it. So only the SQLite and GPGK vector drivers touch code in GDAL using regex.
The binaries for sf and terra are not built, yet, because the system works in cycles and hasn't yet gotten to building them with the new Rtools42. I've just tested locally using the new build of Rtools42. https://www.r-project.org/nosvn/winutf8/ucrt3 has a stamp file, which is currently r_packages_5330_built_by_R-devel-win-82844-5286-5245.exe.stamp. Once the second number after R-devel-win (now 5286), becomes 5336, the new binaries are ready under https://www.r-project.org/nosvn/winutf8/ucrt3/CRAN/bin/windows/contrib/4.3/
Looking at the CI workflow (but not knowing anything about it), it runs on Debian or Ubuntu Linux in wine and builds gdal using mingw cross-compilers available for Linux in the distribution. These target MSVCRT. For checking gdal for use with R, it would be possible to use the cross-compilers distributed with Rtools42, instead, which target UCRT. Ideally with all the pre-built static libraries which are included in Rtools42. The workflow script would not have to change much.
Thanks again @kalibera . Once 5336 has propagated, will it be possible to extend coverage to R-patched - so contrib/4.2 ? If that could be done, and if so, when in place, what would be the best way to trigger fresh binary builds of packages using GDAL?
The CI I was referring to is https://github.com/OSGeo/gdal/blob/8e4c84b5103a832f5f0ca507ce2eec36f479f994/.github/workflows/cmake_builds.yml#L298 which uses mingw64 / msys2 on a Windows builder (the mingw_w64 one is a bit kludgy) Anyway it really looks like this is a toolchain issue. Disabling std::regex usage in GDAL in SQLGetUniqueFieldUCConstraints() is a workaround, and should ideally be fixed more cleanly. We might want to use std::regex in GDAL in the future in parts that can't be disabled.
@rouault yes, the underlying problem seems to be a MSVCRT libsystre in Msys2, which only fails in some settings (my guess). Windows binaries for R are an early adopter of UCRT among FOSS: https://blog.r-project.org/, particularly https://blog.r-project.org/2021/12/07/upcoming-changes-in-r-4.2-on-windows/index.html https://blog.r-project.org/2022/06/16/upcoming-changes-in-r-4.2.1-on-windows/index.html. It appears that OSGeo4W are MSVCRT, possibly because they move at the speed of the slowest included library or application. The patch @kalibera adds is only in the MXE UCRT build train for immediate protection beause a working GPKG driver is vital now. Fixing (or checking) the upstream Msys2 libsystre
for behaviour under UCRT would be preferable, but it is upstream of this case. I guess @kalibera, who is in touch with MXE developers, could raise an issue there, but creating a clear test is not easy.
Thanks again @kalibera . Once 5336 has propagated, will it be possible to extend coverage to R-patched - so contrib/4.2 ? If that could be done, and if so, when in place, what would be the best way to trigger fresh binary builds of packages using GDAL?
No, I am not building packages for R 4.2 with unreleased versions of Rtools42, only with R-devel, that would require too much of computational resources. You can test with R-devel (using my binaries or building the packages from source) and using R 4.2 (building the packages from source).
For build 5286-5107 of Rtools42 (and subsequent), could those testing please note that sf src/Makefile.ucrt
needs updating to add in -lkea -lhdf5_hl -hdf5_cpp
where -lhdf5_hl
is now. In response to a user request, the KEA driver was added in this build of Rtools42. The same applies to terra, etc. rtools42-5253-5107-signed.exe is released Rtools42 matching the src/Makefile.ucrt
files currently present in the source packages. @kalibera is there a way of conditioning on the Rtools build number in src/Makefile.ucrt
- if so, do you have an example of a package using version conditioning?
We might want to use std::regex in GDAL in the future in parts that can't be disabled.
There are known issues with std::regex in multi-byte locales (R runs in UTF-8). See below. The tesseract thread includes a repro that was as far as I understand done using a different UCRT toolchain (not Rtools42). The GCC bug report says that std::regex may get deprecated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723 https://github.com/tesseract-ocr/tesseract/issues/3830 https://github.com/sg16-unicode/sg16/issues/57
When I try to sf::st_layers() on the attached gpkg, rstudio bombs. When I list from a terminal, I see:
Reproduce with:
nav_06.zip