r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.32k stars 293 forks source link

Open layers of geodatabase with st_read() and a wkt_filter #1871

Closed glorious1 closed 1 year ago

glorious1 commented 2 years ago

I'm trying to open some layers in the US National Hydrography Dataset cropped to my area using st_read with a wkt_filter. I always get an empty sf object; has all the column names and crs but no rows. I've made the wkt_filter much bigger than necessary. The layers are in NAD83 (EPSG 4269), which uses degrees, so I don't think that is the problem.

I once was able to open an entire layer without a wkt_filter, but it was so huge R crashed when I tried to crop it. Also, I am able to download non-spatial tables from it no problem.

Here's what I'm doing:

library(sf)
URL <- "/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip"
WKT <- "POLYGON ((-112.6 33.4, -103.6 33.4, -103.6 40.2, -112.6 40.2, -112.6 33.4))"

STR <- st_read( dsn=URL, layer="NHDFlowline", wkt_filter=WKT, quiet=FALSE )
STRarea <- st_read( dsn=URL, layer="NHDArea", wkt_filter=WKT, quiet=FALSE )
LAK <- st_read( dsn=URL, layer="NHDWaterbody", wkt_filter=WKT, quiet=FALSE )

I tried downloading the whole 23 GB zip locally and the same thing happens; wkt_filter returns nothing. I have successfully used wkt_filter on other zipped gdb's. NHD support says this is a standard ESRI GDB; they don't use R so can't help with this.

I'm aware that it is bad to have multiple copies of GEOS, GDAL, or PROJ on the system, but I'm a bit confused about that. Near as I can tell I have only the versions provided by Homebrew: GEOS 3.10.1.1 GDAL 3.3.3_1 Proj 7.2.1 However, when I load sf, it says: "Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1" I can't find any evidence of those versions of GEOS and GDAL. If I remove the homebrew directories from my PATH, sf also opens with: "sf_use_s2() is TRUE"

edzer commented 2 years ago

However, when I load sf, it says: "Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1"

This may come from a binary package install, which includes these three libraries (ignoring homebrew).

glorious1 commented 2 years ago

Thank you, that solves that mystery. Any idea why wkt_filter won't work in this case? By the way, sf is version 1.0-5.

edzer commented 2 years ago

The gdalinfo command line utility doesn't return anything useful when given this data source, so I guess it can't be ready by GDAL.

glorious1 commented 2 years ago

Not sure if this sheds any light, but gdalUtils::ogrinfo can read it, at least field names

library(gdalUtils)
gdalUtils::gdal_setInstallation( search_path = "/opt/homebrew/bin/" )
ogrinfo( URL, "NHDWaterbody", ro=TRUE, so=TRUE )  # lakes and such

My help for gdalinfo() seems to indicate it is for raster datasets, which this is not. Also, st_read can download layers and tables from it; it's just wkt_filter that seems to be not working with it.

edzer commented 2 years ago

Right, I see

ogrinfo -so "/vsizip/vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip" NHDFlowline
INFO: Open of `/vsizip/vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip'
      using driver `OpenFileGDB' successful.

Layer name: NHDFlowline
Geometry: 3D Measured Multi Line String
Feature Count: 29321407
Extent: (-179.150067, -14.374101) - (179.775127, 71.390380)
Layer SRS WKT:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Geodesy."],
        AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands.  British Virgin Islands."],
        BBOX[14.92,167.65,86.46,-47.74]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
FID Column = OBJECTID
Geometry Column = SHAPE
PERMANENT_IDENTIFIER: String (40.0) NOT NULL
FDATE: DateTime (0.0) NOT NULL
RESOLUTION: Integer (0.0) NOT NULL, domain name=Resolution
GNIS_ID: String (10.0)
GNIS_NAME: String (65.0)
LENGTHKM: Real (0.0)
REACHCODE: String (14.0)
FLOWDIR: Integer (0.0) NOT NULL DEFAULT 0, domain name=HydroFlowDirections
WBAREA_PERMANENT_IDENTIFIER: String (40.0)
FTYPE: Integer (0.0) NOT NULL DEFAULT 460
FCODE: Integer (0.0) DEFAULT 46003
GLOBALID: String (0.0) NOT NULL
INNETWORK: Integer (0.0) DEFAULT 1, domain name=NoYes Domain_3
MAINPATH: Integer (0.0) NOT NULL DEFAULT 0, domain name=MainPath Domain_3
VISIBILITYFILTER: Integer (0.0) NOT NULL DEFAULT 0, domain name=VisibilityFilter Domain_6
SHAPE_Length: Real (0.0)

and

> STR <- st_read( dsn=URL, layer="NHDFlowline", wkt_filter=WKT, quiet=FALSE )
Reading layer `NHDFlowline' from data source 
  `/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip' 
  using driver `OpenFileGDB'
Simple feature collection with 0 features and 16 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  NAD83

Are you sure there are features inside your polygon?

glorious1 commented 2 years ago

That's most of 4 western states, and the full layer has 29 million streams with many in the polygon. I'll check again to be sure I haven't messed up the WKT polygon, but it will have to wait a while now; I'll get back to it later. Thanks for your help!

glorious1 commented 2 years ago

I've verified with an online WKT viewer that my polygon is what I thought it was. More importantly, I verified that the layers have features in that polygon, none of which are extracted when using wkt_filter. I downloaded and cropped the "lightest" layer I could find.

URL <- "/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip"
WKT <- WKT <- "POLYGON ((-112.6 33.4, -103.6 33.4, -103.6 40.2, -112.6 40.2, -112.6 33.4))"
# Same polygon as WKT in form for cropping:
LIMS <- c( xmin=-112.6, ymin=33.4, xmax=-103.6, ymax=40.2 )

Using wkt_filter returns empty object:

> X <- st_read( dsn=URL, layer="NHDPointEventFC", wkt_filter=WKT, quiet=FALSE )
Reading layer `NHDPointEventFC' from data source 
  `/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip' 
  using driver `OpenFileGDB'
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  NAD83

Open the entire layer without wkt_filter and crop it:

> X <- st_read( dsn=URL, layer="NHDPointEventFC", quiet=FALSE )
Reading layer `NHDPointEventFC' from data source 
  `/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/National/HighResolution/GDB/NHD_H_National_GDB.zip' 
  using driver `OpenFileGDB'
Simple feature collection with 285690 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -170.7169 ymin: -14.31776 xmax: 145.7813 ymax: 71.38956
Geodetic CRS:  NAD83

# Crop it and get 20,000+ features:
> Xc <- st_crop( x=X, y=LIMS )
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 
> Xc
Simple feature collection with 22447 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -112.5968 ymin: 33.41801 xmax: -103.6026 ymax: 40.28514
Geodetic CRS:  NAD83 [ . . . ]
edzer commented 2 years ago

Here is an example where wkt_filter does seem to work:

library(sf)
# Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE
demo(nc, echo = FALSE)
# Reading layer `nc.gpkg' from data source 
#   `/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg' 
#   using driver `GPKG'
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# Geodetic CRS:  NAD27
plot(st_geometry(nc), axes = TRUE)
wkt = "POLYGON((-84 35,-82 35,-82 37,-84 37,-84 35))"
plot(st_as_sfc(wkt), add = TRUE)
r = st_read(datasource, wkt_filter = wkt)
# Reading layer `nc.gpkg' from data source 
#   `/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg' 
#   using driver `GPKG'
# Simple feature collection with 17 features and 14 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: -84.32385 ymin: 34.98822 xmax: -81.68432 ymax: 36.29075
# Geodetic CRS:  NAD27
plot(st_geometry(r), col = 'red', add = TRUE)
glorious1 commented 2 years ago

Thank you for the example. That's very nice. As mentioned, I have used wkt_filter successfully, even on other GDBs. Just not this one.

glorious1 commented 2 years ago

Just a followup. Although wkt_filter doesn't work on the nationwide NHD gdb, that database is also available for each state, and it does work on those.


WKT <- "POLYGON ((-105.3 36.9, -109.1 36.9, -109.1 39.45, -105.3 39.45, -105.3 36.9))"
STATE <- "Colorado" 
URL <- paste0( "/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/HighResolution/GDB/NHD_H_",
               STATE, "_State_GDB.zip" )

> STR <- st_read( dsn=URL, layer="NHDFlowline", wkt_filter=WKT, quiet=FALSE )
Reading layer `NHDFlowline' from data source 
  `/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/HighResolution/GDB/NHD_H_Colorado_State_GDB.zip' 
  using driver `OpenFileGDB'
Simple feature collection with 433181 features and 15 fields
Geometry type: MULTILINESTRING
Dimension:     XYZM
Bounding box:  xmin: -109.1489 ymin: 36.85061 xmax: -105.1593 ymax: 39.49173
z_range:       zmin: 0 zmax: 0
m_range:       mmin: 0 mmax: 100
Geodetic CRS:  NAD83