ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

soilDB::mukey.wcs() fails when there is no mukey data in gSSURGO #304

Closed kevinwolz closed 11 months ago

kevinwolz commented 1 year ago

Attached here is an example AOI, that does not contain any SSURGO data. Querying for mukey data from gSSURGO, results in an error, because there are no mukeys returned:

aoi <- sf::st_read("aoi_FL_keys.kml")
soilDB::mukey.wcs(aoi = aoi, db = "gssurgo", res = 30, quiet = FALSE)
Error in `[.data.frame`(value, , 1) : undefined columns selected

The error occurs because on the line: levels(r)[[1]] <- rat because the rat is a data frame with 0 columns and 0 rows.

It would be much better if, in this case, soilDB::mukey.wcs() still output the resulting raster, but with all cell values set to whatever value is typical when there is no mukey in a given cell.

aoi_FL_keys.kml.zip FYI, the AOI includes the Marquesas Keys in Monroe County, FL (the western-most tip of the Florida Keys). This region comes up in my analysis because it is technically within a US county (based on the tigris county outlines).

brownag commented 1 year ago

Thanks @kevinwolz for bringing this up!

I think I have fixed the issue with the empty RAT in a PR #306. Now a raster with all NaN values ("NoData") is returned. NoData in this case indicates no soil survey area, whereas NOTCOM is an unmapped survey area, which has a NOTCOM mukey and mapunit associated with it.

Specifically for data for the Marquesas Keys--it appears this area is not associated with any soil survey area, and the AOI you are using partially falls outside CONUS gSSURGO boundary in y dimension. FL687 includes up to Key West. I note also there is a (currently NOTCOM) Dry Tortugas SSA FL687 even further to the west. image


This raises another issue with AOIs at the "edge of the map." We have encountered this before, especially with the raster soil surveys. Currently AOIs that are too large for the source data extent generate warnings because the result from the web service does not match the expected extent of the AOI input. This affects the resolution which could prove to be an issue for user expecting a consistent grid from a set of queries.

Here is an example for a slightly larger AOI that contains some data:

library(soilDB)
library(sf)
setwd("~/workspace/gh/soilDB/304")
aoi <- sf::st_read("/vsizip/aoi_FL_keys.kml.zip")
#> Reading layer `aoi_FL_keys' from data source `/vsizip/aoi_FL_keys.kml.zip' using driver `KML'
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -82.32659 ymin: 24.47644 xmax: -81.97248 ymax: 24.75134
#> Geodetic CRS:  WGS 84

res0 <- soilDB::mukey.wcs(aoi = aoi, db = "gssurgo", res = 30, quiet = FALSE)
#> Warning: expected 1167 rows, received 893; Y resolution may be affected. Try
#> requesting a smaller extent.
#> Warning: [] no labels

res <- soilDB::mukey.wcs(aoi = st_buffer(aoi, 45000), db = "gssurgo", res = 30, quiet = FALSE)
#> Warning: expected 4568 rows, received 2599; Y resolution may be affected. Try
#> requesting a smaller extent.

# some data, not full extent
res
#> class       : SpatRaster 
#> dimensions  : 2599, 4796, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30.00172  (x, y)
#> extent      : 1344822, 1488702, 270015, 347989.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source(s)   : memory
#> categories  : filed0cd28bfe084 
#> name        :   mukey 
#> min value   :  797593 
#> max value   : 3116266

terra::plot(res)
terra::plot(terra::project(terra::vect(aoi), res), add = TRUE)

Right now, other than creating the categories, there is little processing of the grid result after it is downloaded... it could be NaN-padded/terra::extend()-ed to match input dimensions. At a minimum the set resolution should be fixed for the portion of the raster that returns, and probably the warnings about not getting data for full extent should be retained. I will revisit the logic on grid resolution

kevinwolz commented 1 year ago

Thanks for the quick fix. Yes, I have definitely run into issues as a result of the non-integer resolution that does not match the input resolution. To clarify, did you address that as well?

It looks like you added a fix for this but are still warning the user about the different number of rows received?

brownag commented 1 year ago

Thanks for the quick fix. Yes, I have definitely run into issues as a result of the non-integer resolution that does not match the input resolution. To clarify, did you address that as well?

Yes, I think resolution is addressed now in https://github.com/ncss-tech/soilDB/commit/3cc97a9ec5b167c875fd043518fa299b46ddb690 which we can probably merge soon after a little testing

The spatial extent of the WCS result in edge cases is not what it should be given the number of rows and columns returned, so there appears to be a slight offset. It seems that (at least with certain input geometry) you will get a fraction of a pixel cut off the extent. I am not sure if that is something we can fix on the server side (@dylanbeaudette), but I believe it has now been corrected for the final result in R by resetting the extent of the SpatRaster.

It looks like you added a fix for this but are still warning the user about the different number of rows received?

Yeah, in the PR it is still a warning. Perhaps it could be downgraded to a message?

Requests that are entirely outside the source grid return "bad WCS request". It might be nice if we could provide a more informative error in these cases.

Requests that are partially overlapping the source grid now return only the portion that overlaps (with corrected resolution). I feel the need for the message is b/c the user requested a larger area, so may expect to receive a raster matching input extent as well as resolution. It would be possible to terra::extend() to the input geometry when a mismatch between expected/actual is detected.


EDIT: in #306 I have added the automatic extension to the input AOI extent. I think this is safe to do because the queries are already limited to a grid with dimension 5000 by 5000. This allows the user to make any request that partially overlaps and get a complete result with no modification of server side.

Consider an area to the northern boundary of CONUS, including a portion of Lake of the Woods County, Minnesota. Below I show results are identical to make a request that goes out of bounds, and compare with a fully in-bounds request that has been extended to the first request extent.

library(soilDB)
library(terra)

aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 52.5623)), crs = "OGC:CRS84")
res <- mukey.wcs(aoi, res = 100)
#> Request partially outside boundary of coverage source data: expected 3522 rows, received 2481
plot(res)


aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 50.5623)), crs = "OGC:CRS84")
res2 <- mukey.wcs(aoi, res = 100)
plot(res2)

#> > unique(res == extend(res2, res), na.rm = FALSE)
#>  mukey
#> 1     1
#> 2   NaN
kevinwolz commented 1 year ago

@brownag I think the use of terra::ext and terra::expand is a great way to deal with these issues. I agree that returning something that is the same shape and resolution as what the user supplied is the best way to do it. The user can always trim thing later if they want.

My only question/caution is around speed implications. Are the calls to terra::ext and terra::expand going to unnecessarily slow down mukey.wcs when we are not operating in these edge cases? Would it make sense to put these calls behind an if statement to only call them if needed? Or perhaps the terra functions already have this built in to where they only "act" if necessary?

brownag commented 1 year ago

My only question/caution is around speed implications. Are the calls to terra::ext and terra::expand going to unnecessarily slow down mukey.wcs when we are not operating in these edge cases? Would it make sense to put these calls behind an if statement to only call them if needed? Or perhaps the terra functions already have this built in to where they only "act" if necessary?

@kevinwolz I only adjust extent with ext() and call extend() conditionally--so only when the message is issued that the WCS result does not match what is calculated from the input geometry.

Setting a new extent/resolution with ext() is nearly instantaneous, and extend() runs quickly even in the most extreme cases because the input geometry is limited to a grid that is 5000x5000, so you will never be creating a result that is larger than that (in whatever grid resolution you request).

kevinwolz commented 1 year ago

Great! Once you merge, I'll be ready to install and test the hell out of it with every edge case I can find.

dylanbeaudette commented 1 year ago

I'll start a new issue to test the changes against local calls to crop() on a gSSURGO mukey grid. It would be nice to know that the WCS and tinkering with the result don't deviate too much from the source data.

307

kevinwolz commented 11 months ago

Hi all, is this ready for me to test/implement?

dylanbeaudette commented 11 months ago

Sure. However, gNATSGO / gSSURGO grids are still last year's data--FY23.

brownag commented 11 months ago

Hi all, is this ready for me to test/implement?

@kevinwolz PR #306 has been merged and should resolve these problems with the marginal areas of the WCS grids. Sorry for the delay. Let us know when you find a way to break it :)

In the future if you would like to weigh in on in-progress stuff before it is merged into the main branch, you can install directly from a branch or commit e.g. remotes::install_github("ncss-tech/soilDB@fix304") where "@<refname>" refers to the branch name, commit hash, tag, etc.