ropensci / FedData

Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources
https://docs.ropensci.org/FedData
Other
94 stars 22 forks source link

get_nhd() fails for templates with no NHD point data #98

Closed kevinwolz closed 1 year ago

kevinwolz commented 1 year ago

get_nhd() fails for templates with no NHD "Point" data because THIS line is hard-coded to attempt to extract the CRS from nhd_out[[1]]. The first element of nhd_out is always the NHD "Point" data. If there are not "Point" data returned by esri_query(), then this element is NULL.

HERE is an example polygon that triggers this.

One option to fix this would be to add the following code HERE:

null_elements <- purrr::map_lgl(nhd_out, is.null)
if(all(null_elements)) stop("No NHD data present within template.")
nhd_for_crs <- nhd_out[!null_elements][[1]]

And then replace nhd_out[[1]] with nhd_for_crs.

This ensures that you will get a CRS regardless of which NHD elements might happen to be missing in a given area. Furthermore, an error would be thrown if no NHD data of any type is returned for the template area (which is certainly possible within a small aoi).

Jordan-Heiman commented 1 year ago

Not sure if it is related or not but I am also having trouble with get_nhd(). I'm using a simple rectangle created from a bounding box to try to pull an NHD dataset. The rectangle has worked for other FedData functions but every time I run get_nhd() I get the following:

> cam_bbox
Geometry set for 1 feature 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 245359.3 ymin: 5349970 xmax: 333488.2 ymax: 5434257
Projected CRS: WGS 84 / UTM zone 12N
POLYGON ((245359.3 5349970, 333488.2 5349970, 3...
> hydro <- get_nhd(template = cam_bbox,
+                  label = "GNP",
+                  extraction.dir = "./1.Data/FedLayers/Hydro")
Error in `purrr::map()`:
ℹ In index: 4.
ℹ With name: Area - Large Scale.
Caused by error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: 1.
Caused by error in `CPL_get_z_range()`:
! z error - expecting three columns;
Run `rlang::last_error()` to see where the error occurred.
bocinsky commented 1 year ago

Adding the small_aoi as a geojson here, for quick reference with testing.

small_aoi.geojson.zip

bocinsky commented 1 year ago

Fixed the primary issue by implementing the solution by @kevinwolz. Thank you for the suggestion, and your patience! @Jordan-Heiman, I think I also corrected your issue, which had to do with some of the features in the NHD having Z coordinates. Sending to CRAN today!

Jordan-Heiman commented 1 year ago

Hi @bocinsky, I'm still having trouble with the get_nhd function. I updated to 3.0.4 but am getting an error even using the example meve data in the package. I really appreciate the troubleshooting!

Data from package:

> get_nhd(
+   template = FedData::meve,
+   label = "meve"
+ )
Error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: Point.
Caused by error:
! lexical error: invalid char in json text.
                                       <!DOCTYPE HTML PUBLIC "-//W3C//
                     (right here) ------^
Run `rlang::last_trace()` to see where the error occurred.

Raster mask from dataset

> raster_mask
class       : SpatRaster 
dimensions  : 25625, 16042, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 2259841, 2741101, 1214089, 1982839  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Idaho Transverse Mercator (EPSG:8826) 
source(s)   : memory
name        : lyr.1 
min value   :     0 
max value   :     0 
> get_nhd(template = raster_mask,
+         label = "red_fox", 
+         nhdplus = FALSE,
+         extraction.dir = "./1.Data/2.Covariate_Layers/FedLayers/Hydro",
+         force.redo = TRUE)
Error in `purrr::map()`:
ℹ In index: 2.
ℹ With name: Flowline - Large Scale.
Caused by error:
! lexical error: invalid char in json text.
                                       <!DOCTYPE HTML PUBLIC "-//W3C//
                     (right here) ------^
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/purrr_error_indexed>
Error in `purrr::map()`:
ℹ In index: 2.
ℹ With name: Flowline - Large Scale.
Caused by error:
! lexical error: invalid char in json text.
                                       <!DOCTYPE HTML PUBLIC "-//W3C//
                     (right here) ------^
---
Backtrace:
     ▆
  1. ├─FedData::get_nhd(...)
  2. │ └─... %$% ...
  3. ├─base::with(...)
  4. ├─FedData:::esri_query(...)
  5. │ └─layer_ids %>% ...
  6. ├─purrr::map(...)
  7. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
  8. │   ├─purrr:::with_indexed_errors(...)
  9. │   │ └─base::withCallingHandlers(...)
 10. │   ├─purrr:::call_with_cleanup(...)
 11. │   └─FedData (local) .f(.x[[i]], ...)
 12. │     └─... %>% magrittr::extract2("objectIds")
 13. └─jsonlite::fromJSON(.)
 14.   └─jsonlite:::parse_and_simplify(...)
 15.     └─jsonlite:::parseJSON(txt, bigint_as_char)
 16.       └─jsonlite:::parse_string(txt, bigint_as_char)
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
bocinsky commented 1 year ago

hi. Yes, there was a different bug that crept in, but it is fixed on the dev version (not yet on CRAN).

Install the dev version with:

devtools::install_github("ropensci/FedData")

and then give it another try.

Jordan-Heiman commented 1 year ago

Thanks for the quick response. Using the dev version fixed the problem with the meve dataset but I am still getting the same error when using my own bounding box layer or a raster mask.

bocinsky commented 1 year ago

Hi @Jordan-Heiman. Sorry for the super long delay in response. It seems to me that this is choking because of the size of the region you are trying to download, and there is no clear way I've found to see what an acceptable size would be... it varies given the constraints of the server that is hosting the dataset. For your use case (and you've probably already done this), I'd just download the complete state from the NHD pre-staged subset server.

Here are two ways to do it (it is ~400MB, so takes some time):

library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

## Download to a temporary file
tmp <- tempfile(fileext = ".zip")
httr::GET(
  "https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/GDB/NHD_H_Idaho_State_GDB.zip",
  httr::write_disk(tmp))
#> Response [https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/GDB/NHD_H_Idaho_State_GDB.zip]
#>   Date: 2023-07-28 20:46
#>   Status: 200
#>   Content-Type: binary/octet-stream
#>   Size: 419 MB
#> <ON DISK>  /var/folders/ys/7l0z3wlx7z14qxn9v0m9ckhw0000gq/T//RtmpuiLVwf/file6ad56224fac4.zip

## Load straight from the zip directory
tmp_vsi <- paste0("/vsizip/", tmp, "/NHD_H_Idaho_State_GDB.gdb/")
sf::read_sf(tmp_vsi, 
            layer = "NHDWaterbody",
            quiet = TRUE
            )
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL
#> Message 1: organizePolygons() received a polygon with more than 100 parts. The
#> processing may be really slow.  You can skip the processing by setting
#> METHOD=SKIP, or only make it analyze counter-clock wise parts by setting
#> METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock
#> wise defined
#> Simple feature collection with 74094 features and 13 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XYZ
#> Bounding box:  xmin: -118.5415 ymin: 40.67095 xmax: -110.1605 ymax: 50.21636
#> z_range:       zmin: 0 zmax: 0
#> Geodetic CRS:  NAD83
#> # A tibble: 74,094 × 14
#>    permanent_identifier fdate               resolution gnis_id  gnis_name       
#>    <chr>                <dttm>                   <int> <chr>    <chr>           
#>  1 141467365            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  2 56465275             2020-11-23 17:00:00          2 00371496 Big Fill Reserv…
#>  3 141467608            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  4 141467649            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  5 141467353            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  6 141467755            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  7 141467638            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  8 163906002            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  9 141467156            2021-01-13 17:00:00          2 <NA>     <NA>            
#> 10 141467343            2021-01-13 17:00:00          2 <NA>     <NA>            
#> # ℹ 74,084 more rows
#> # ℹ 9 more variables: areasqkm <dbl>, elevation <dbl>, reachcode <chr>,
#> #   ftype <int>, fcode <int>, visibilityfilter <int>, SHAPE_Length <dbl>,
#> #   SHAPE_Area <dbl>, SHAPE <MULTIPOLYGON [°]>

# There are many other layers. View them with:
sf::st_layers(tmp_vsi)
#> Driver: OpenFileGDB 
#> Available layers:
#>                 layer_name                 geometry_type features fields
#> 1        ExternalCrosswalk                            NA    42930      6
#> 2                 NHDFCode                            NA      126     13
#> 3     NHDFeatureToMetadata                            NA  2667709      2
#> 4                  NHDFlow                            NA        0      4
#> 5           NHDFlowlineVAA                            NA   615145     22
#> 6              NHDMetadata                            NA     1855     21
#> 7  NHDProcessingParameters                            NA        5      2
#> 8  NHDReachCodeMaintenance                            NA   562561      6
#> 9   NHDReachCrossReference                            NA   611699     11
#> 10       NHDSourceCitation                            NA     2137     13
#> 11               NHDStatus                            NA        0      3
#> 12 NHDVerticalRelationship                            NA      506      3
#> 13            NHDWaterbody              3D Multi Polygon    74094     13
#> 14         NHDPointEventFC                         Point     8196     14
#> 15                NHDPoint                      3D Point    25036      8
#> 16          NHDLineEventFC             Multi Line String        0     16
#> 17                 NHDLine          3D Multi Line String     1705     10
#> 18             NHDFlowline 3D Measured Multi Line String   638414     15
#> 19          NHDAreaEventFC                 Multi Polygon        0     14
#> 20                 NHDArea              3D Multi Polygon     3330     12
#> 21                 WBDLine             Multi Line String    14690      7
#> 22                  WBDHU8                 Multi Polygon       92     14
#> 23                  WBDHU6                 Multi Polygon       14     14
#> 24                  WBDHU4                 Multi Polygon        7     14
#> 25                  WBDHU2                 Multi Polygon        3     14
#> 26                 WBDHU16                 Multi Polygon        0     19
#> 27                 WBDHU14                 Multi Polygon        0     19
#> 28                 WBDHU12                 Multi Polygon     3799     19
#> 29                 WBDHU10                 Multi Polygon      733     16
#>    crs_name
#> 1      <NA>
#> 2      <NA>
#> 3      <NA>
#> 4      <NA>
#> 5      <NA>
#> 6      <NA>
#> 7      <NA>
#> 8      <NA>
#> 9      <NA>
#> 10     <NA>
#> 11     <NA>
#> 12     <NA>
#> 13    NAD83
#> 14    NAD83
#> 15    NAD83
#> 16    NAD83
#> 17    NAD83
#> 18    NAD83
#> 19    NAD83
#> 20    NAD83
#> 21    NAD83
#> 22    NAD83
#> 23    NAD83
#> 24    NAD83
#> 25    NAD83
#> 26    NAD83
#> 27    NAD83
#> 28    NAD83
#> 29    NAD83

## Or, more simply, if you just want one layer:
sf::read_sf(
  "/vsizip//vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/GDB/NHD_H_Idaho_State_GDB.zip/NHD_H_Idaho_State_GDB.gdb/",
  layer = "NHDWaterbody",
  quiet = TRUE
)
#> Simple feature collection with 74094 features and 13 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XYZ
#> Bounding box:  xmin: -118.5415 ymin: 40.67095 xmax: -110.1605 ymax: 50.21636
#> z_range:       zmin: 0 zmax: 0
#> Geodetic CRS:  NAD83
#> # A tibble: 74,094 × 14
#>    permanent_identifier fdate               resolution gnis_id  gnis_name       
#>    <chr>                <dttm>                   <int> <chr>    <chr>           
#>  1 141467365            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  2 56465275             2020-11-23 17:00:00          2 00371496 Big Fill Reserv…
#>  3 141467608            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  4 141467649            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  5 141467353            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  6 141467755            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  7 141467638            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  8 163906002            2021-01-13 17:00:00          2 <NA>     <NA>            
#>  9 141467156            2021-01-13 17:00:00          2 <NA>     <NA>            
#> 10 141467343            2021-01-13 17:00:00          2 <NA>     <NA>            
#> # ℹ 74,084 more rows
#> # ℹ 9 more variables: areasqkm <dbl>, elevation <dbl>, reachcode <chr>,
#> #   ftype <int>, fcode <int>, visibilityfilter <int>, SHAPE_Length <dbl>,
#> #   SHAPE_Area <dbl>, SHAPE <MULTIPOLYGON [°]>

Created on 2023-07-28 with reprex v2.0.2

Jordan-Heiman commented 1 year ago

I appreciate you circling around to this. I thought the large region might be the source of the problem. I was able to just download the state-wide area and am using that.