ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

`mukey.wcs()`: fix areas containing all NoData and resolution for AOI at edge of WCS extent #306

Closed brownag closed 11 months ago

brownag commented 1 year ago

Fix for #304, issue raised by @kevinwolz who provided example AOI in that issue from the FL keys

Uses terra::as.factor() for result that has categorical representation of mukey (resolves TODO)

library(soilDB)
library(sf)
setwd("~/workspace/gh/soilDB/304")
aoi <- sf::st_read("/vsizip/aoi_FL_keys.kml.zip")
#> Reading layer `aoi_FL_keys' from data source `/vsizip/aoi_FL_keys.kml.zip' using driver `KML'
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -82.32659 ymin: 24.47644 xmax: -81.97248 ymax: 24.75134
#> Geodetic CRS:  WGS 84
res <- soilDB::mukey.wcs(aoi = aoi, db = "gssurgo", res = 30, quiet = FALSE)
#> Warning: expected 1167 rows, received 893; Y resolution may be affected. Try
#> requesting a smaller extent.
#> Warning: [] no labels
res
#> class       : SpatRaster 
#> dimensions  : 893, 1344, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30.0127  (x, y)
#> extent      : 1396280, 1436600, 270015, 296816.3  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source(s)   : memory
#> name        : mukey 
#> min value   :   NaN 
#> max value   :   NaN

The warnings indicate that the coverage service does not return the expected grid size. The corresponding dimension(s) out of bounds may have their resolution affected (i.e. they do not match the input res argument, or the other dimension)

Note that the warnings are "expected" but unfortunate result of the boundary of the source raster for the CONUS gSSURGO on SoilWeb. We have most commonly encountered this issue with raster soil surveys which are of limited extent. One server side option we could implement to avoid bumping up against resolution problems frequently is increase the buffer of NoData surrounding the current extent.

dylanbeaudette commented 1 year ago

Thanks for responding to this issue so quickly. I've been wondering lately about these edge cases. I'll do some research on how WCS can / should respond.

Related TODO items

For later, some test cases

# entirely outside of CONUS
bb <- 'POLYGON((-118.5392 31.4338,-118.5392 31.5660,-118.2850 31.5660,-118.2850 31.4338,-118.5392 31.4338))'
bb <- vect(bb, crs = 'epsg:4326')
mu <- mukey.wcs(bb)

plot(mu, axes = FALSE)

# single NOTCOM polygon
bb <- 'POLYGON ((-113.7389 32.2690,-113.7389 32.2772,-113.7230 32.2772,-113.7230 32.2690,-113.7389 32.2690))'
bb <- vect(bb, crs = 'epsg:4326')
mu <- mukey.wcs(bb)

plot(mu, axes = FALSE)

# all NOTCOM, multiple mukeys used
bb <- 'POLYGON((-113.6034 32.2435,-113.6034 32.2763,-113.5398 32.2763,-113.5398 32.2435,-113.6034 32.2435))'
bb <- vect(bb, crs = 'epsg:4326')
mu <- mukey.wcs(bb)

plot(mu, axes = FALSE)

# partially overlapping with CONUS border
bb <- 'POLYGON((-112.9801 31.8124,-112.9801 31.9440,-112.7259 31.9440,-112.7259 31.8124,-112.9801 31.8124))'
bb <- vect(bb, crs = 'epsg:4326')
mu <- mukey.wcs(bb)

plot(mu, axes = FALSE)
brownag commented 1 year ago

I have added the automatic extension to the input AOI via terra::extend(). I think this is safe to do because the queries are already limited to a grid with dimension 5000 by 5000. This allows the user to make any request that partially overlaps and get a complete result with no modification of server side.

Consider an area to the northern boundary of CONUS ("opposite" the FL keys), including a portion of Lake of the Woods County, Minnesota. Below I show results are identical to make a request that goes out of bounds, and compare with a fully in-bounds request that has been extended to the first request extent.

library(soilDB)
library(terra)

# long narrow aoi, overlaps CONUS raster bounding box
aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 52.5623)), crs = "OGC:CRS84")
res <- mukey.wcs(aoi, res = 100)
#> Request partially outside boundary of coverage source data: expected 3522 rows, received 2481
plot(res)

# less long, fits entirely inside CONUS bounding box
aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 50.5623)), crs = "OGC:CRS84")
res2 <- mukey.wcs(aoi, res = 100)
plot(res2)

#> > unique(res == extend(res2, res), na.rm = FALSE)
#>  mukey
#> 1     1
#> 2   NaN
brownag commented 11 months ago

Merging this. A more generic fix may be achievable through a pattern similar to what is shown in https://github.com/ncss-tech/soilDB/pull/313