ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

SDA_spatialQuery() for multiple features fails #300

Closed dschlaep closed 1 year ago

dschlaep commented 1 year ago

I have a sf object with multiple points and SDA_spatialQuery() doesn't work (anymore) -- despite the documentation suggesting that geom "May contain multiple features".

It does work for an sf object with one multipoint feature or with one point.

print(getNamespaceVersion("soilDB"))
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)
#> version 
#> "2.7.8"
print(getNamespaceVersion("sf"))
#>  version 
#> "1.0-14"
print(getNamespaceVersion("wk"))
#> version 
#> "0.8.0"

# Example from `?SDA_spatialQuery`
p <- sf::st_as_sf(
  data.frame(x = -119.72330, y = 36.92204),
  coords = c('x', 'y'),
  crs = 4326
)

soilDB::SDA_spatialQuery(p, what = 'mukey')
#>    mukey                                               muname
#> 1 464412 Pollasky-Montpellier complex, 9 to 15 percent slopes

# Create sf object with 2 points
p2 <- rbind(p, p)

# Create sf object with one multipoint
pm <- sf::st_combine(p2)

# Extracting "mukey" for sf object with two points doesn't work
soilDB::SDA_spatialQuery(p2, what = 'mukey')
#> NULL

# Extracting "mupolygon" for sf object with one point works
soilDB::SDA_spatialQuery(p, what = 'mupolygon')
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -119.7432 ymin: 36.89392 xmax: -119.7103 ymax: 36.92964
#> Geodetic CRS:  WGS 84
#>    mukey                           geom
#> 1 464412 POLYGON ((-119.7432 36.9036...

# but not for sf object with two points
soilDB::SDA_spatialQuery(p2, what = 'mupolygon')
#> Error in UseMethod("st_as_sfc"): no applicable method for 'st_as_sfc' applied to an object of class "NULL"

# it works again after combining points to multipoint
soilDB::SDA_spatialQuery(pm, what = 'mupolygon')
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -119.7432 ymin: 36.89392 xmax: -119.7103 ymax: 36.92964
#> Geodetic CRS:  WGS 84
#>    mukey                           geom
#> 1 464412 POLYGON ((-119.7432 36.9036...

Created on 2023-09-18 with reprex v2.0.2

I suspect that the conversion to wkt by SDA_spatialQuery() is to blame - it appears that changing wk::wk_collection(wk::as_wkt(geom)) to wk::wk_collection(geom) may do the trick (but I haven't looked into potential inadvertent side-effects of such a change); or it may be an issue with the wk package?

print(getNamespaceVersion("sf"))
#>  version 
#> "1.0-14"
print(getNamespaceVersion("wk"))
#> version 
#> "0.8.0"

p <- sf::st_as_sf(
  data.frame(x = -119.72330, y = 36.92204),
  coords = c('x', 'y'),
  crs = 4326
)

# Create sf object with 2 points
p2 <- rbind(p, p)

# Create sf object with one multipoint
pm <- sf::st_combine(p2)

# # doesn't work (see `SDA-spatial.R` line 413)
wk::wk_collection(wk::as_wkt(p2))
#> <wk_wkt[1] with CRS=EPSG:4326>
#> [1] GEOMETRYCOLLECTION (POINT (-119.7233 36.92204)!!! Expected ',' or ')' but found 'POINT' at byte 46

wk::wk_collection(p2) # works
#> Geometry set for 1 feature 
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension:     XY
#> Bounding box:  xmin: -119.7233 ymin: 36.92204 xmax: -119.7233 ymax: 36.92204
#> Geodetic CRS:  WGS 84
#> GEOMETRYCOLLECTION (POINT (-119.7233 36.92204),...

wk::wk_collection(wk::as_wkt(pm)) # also works
#> <wk_wkt[1] with CRS=EPSG:4326>
#> [1] GEOMETRYCOLLECTION (MULTIPOINT ((-119.7233 36.92204), (-119.7233 36.92204)))

Created on 2023-09-18 with reprex v2.0.2

dylanbeaudette commented 1 year ago

Thanks for the feedback. I'll try to look into this more deeply soon. T

Until then, I did a similar test, using multiple features / object. Not quite the same as "multi-feature" objects (e.g. MULTIPOINT). I'd avoid using multi-objects at this point. Perhaps the documentation and function code can better accommodate them soon.

library(soilDB)
library(terra)
library(sf)

# https://casoilresource.lawr.ucdavis.edu/gmap/?loc=38.88896,-90.42297,z14
# https://casoilresource.lawr.ucdavis.edu/gmap/?loc=38.86297,-90.36821,z14

# two points
p <- data.frame(x = c(-90.42297, -90.36821), y = c(38.88896, 38.86297))
p <- vect(p, geom = c('x', 'y'), crs = 'epsg:4326')

# try polygons
b <- buffer(p, 250)

## duplicate features, works fine
# p <- data.frame(x = c(-90.42297, -90.42297), y = c(38.88896, 38.88896))
# p <- vect(p, geom = c('x', 'y'), crs = 'epsg:4326')

# works with sf objects too
# b <- st_as_sf(b)

# works as expected
(m <- SDA_spatialQuery(b, what = 'mukey'))

# works as expected
(m <- SDA_spatialQuery(b, what = 'mukey', byFeature = TRUE))

# works as expected
m <- SDA_spatialQuery(b, what = 'mupolygon')
plot(m)
lines(b, col = 2, cex = 1.5)

All as expected, working with points or buffered points (polygons).

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-12    terra_1.7-31 soilDB_2.7.8

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10.2      compiler_4.2.2     pillar_1.9.0       class_7.3-22       tools_4.2.2       
 [6] jsonlite_1.8.4     lifecycle_1.0.3    tibble_3.2.1       lattice_0.21-8     pkgconfig_2.0.3   
[11] rlang_1.1.1        DBI_1.1.3          cli_3.6.1          rstudioapi_0.14    curl_5.0.0        
[16] aqp_2.0.2          e1071_1.7-13       xml2_1.3.4         httr_1.4.6         stringr_1.5.0     
[21] dplyr_1.1.2        cluster_2.1.4      systemfonts_1.0.4  generics_0.1.3     vctrs_0.6.2       
[26] classInt_0.4-9     grid_4.2.2         tidyselect_1.2.0   glue_1.6.2         data.table_1.14.8 
[31] R6_2.5.1           textshaping_0.3.6  fansi_1.0.4        sp_1.6-0           farver_2.1.1      
[36] magrittr_2.0.3     codetools_0.2-19   units_0.8-2        ragg_1.2.5         KernSmooth_2.23-21
[41] utf8_1.2.3         stringi_1.7.12     proxy_0.4-27       wk_0.7.3 
dschlaep commented 1 year ago

Thanks, interesting! - This may be a wk issue then. They did some work on the "sfc_writer" between v0.7.3 (the one you have) and v0.8.0 (the one I have installed) -- https://github.com/paleolimbot/wk/releases/tag/v0.8.0

brownag commented 1 year ago

@dschlaep I can confirm this appears to be a regression in {wk}. As far as whether we should have to convert sf objects to wk_* objects before creating a geometry collection... I think it is supposed to work either way.

With wk 0.7.3 we have the following (where creating a collection from wk_wkt class works as expected)

# wk 0.7.3
p <- sf::st_as_sf(
  data.frame(x = -119.72330, y = 36.92204),
  coords = c('x', 'y'),
  crs = 4326
)
p2 <- rbind(p, p)
pm <- sf::st_combine(p2)
wk::wk_collection(wk::as_wkt(p2))
#> <wk_wkt[1] with CRS=EPSG:4326>
#> [1] GEOMETRYCOLLECTION (POINT (-119.7233 36.92204), POINT (-119.7233 36.92204))

But creating from MULTIPOINT fails:

wk::wk_collection(pm)
#> Geometry set for 1 feature 
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension:     XY
#> Bounding box:  xmin: -119.7233 ymin: 36.92204 xmax: -119.7233 ymax: 36.92204
#> Geodetic CRS:  WGS 84
#> Error in UseMethod("st_as_text"): no applicable method for 'st_as_text' applied to an object of class "NULL"

With wk 0.8.0 have a fix (https://github.com/paleolimbot/wk/issues/186) for the error with MULTIPOINT input, but it seems to break the writing of WKT in the former case with GEOMETRYCOLLECTION (POINT (-119.7233 36.92204)!!! Expected ',' or ')' but found 'POINT' at byte 46

# wk 0.8.0
p <- sf::st_as_sf(
  data.frame(x = -119.72330, y = 36.92204),
  coords = c('x', 'y'),
  crs = 4326
)
p2 <- rbind(p, p)
pm <- sf::st_combine(p2)
wk::wk_collection(wk::as_wkt(p2))
#> <wk_wkt[1] with CRS=EPSG:4326>
#> [1] GEOMETRYCOLLECTION (POINT (-119.7233 36.92204)!!! Expected ',' or ')' but found 'POINT' at byte 46
wk::wk_collection(pm)
#> Geometry set for 1 feature 
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension:     XY
#> Bounding box:  xmin: -119.7233 ymin: 36.92204 xmax: -119.7233 ymax: 36.92204
#> Geodetic CRS:  WGS 84
#> GEOMETRYCOLLECTION (MULTIPOINT ((-119.7233 36.9...
brownag commented 1 year ago

@dschlaep @dylanbeaudette I found a tiny order of operations issue with wk collection filter; see fix in PR here: https://github.com/paleolimbot/wk/pull/194

Installing from above branch (remotes::install_github("brownag/wk@195154ed572f33da2e44c4f9428646127edcc6c1")) should give the expected results with the problematic example:

soilDB::SDA_spatialQuery(p2)
#>    mukey                                               muname
#> 1 464412 Pollasky-Montpellier complex, 9 to 15 percent slopes

wk::wk_collection(wk::as_wkt(p2))
#> <wk_wkt[1] with CRS=EPSG:4326>
#> [1] GEOMETRYCOLLECTION (POINT (-119.7233 36.92204), POINT (-119.7233 36.92204))
dschlaep commented 1 year ago

Indeed, your wk fix fixed this for me. Sorry for mis-attributing the issue to soilDB! Thanks for looking into it and relaying it to wk!

brownag commented 1 year ago

Indeed, your wk fix fixed this for me. Sorry for mis-attributing the issue to soilDB! Thanks for looking into it and relaying it to wk!

Great! No problem, glad that you pointed the issue out and it was straightforward to correct.

With these web service functions that take various user inputs it can be a while before we see errors pop up in unit tests, if at all, especially when a new version of a package breaks something, so don't hesitate to post an issue if something is not behaving.

A work-around for using the current CRAN wk with multiple input features would be to use byFeature=TRUE (which processes a request for each individual feature in the input geometry). This seems to get around the above issue with the indexing in the wk collection filter, but can be a lot slower as each feature is queried independently, and in the case where you have multiple points per MUKEY you will get "duplicate" records, one for each point--which may or may not be what you want.