ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

fetchSDA_spatial: add new T-SQL aggregate geometry methods #299

Closed brownag closed 1 year ago

brownag commented 1 year ago

Adds new options ("extent", "convexhull", "union", and "collection") for method argument to fetchSDA_spatial(). These options perform various aggregations on the mupolygon or sapolygon geometry on the server side, returning a simplified result.

Using the T-SQL static aggregate geometry methods:

Demonstration:

suppressPackageStartupMessages(library(soilDB))

# default behavior: 2 bbox for extent, one for each mukey (in this case these have same nationalmusym)
x <- fetchSDA_spatial(
  x = c(2440240, 2924995),
  method = "extent"
) 
#> Using 1 chunks...
#> Chunk #1 completed (n = 2; 1 secs)
#> Done in 1 secs; mean/chunk: 1 secs; mean/symbol: 0.5 secs.
x
#> Simple feature collection with 2 features and 3 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -120.8787 ymin: 37.47041 xmax: -119.9521 ymax: 38.30401
#> Geodetic CRS:  WGS 84
#>     mukey areasymbol nationalmusym                           geom
#> 1 2440240      CA630         2mx8f POLYGON ((-120.8787 37.7260...
#> 2 2924995      CA649         2mx8f POLYGON ((-120.3299 37.4704...
plot(x$geom)

# if by.col="nationalmusym" only one result is returned. in this case using method="convexhull"
y <- fetchSDA_spatial(
  x = "2mx8f",
  by.col = "nationalmusym",
  method = "convexhull"
)
#> Using 1 chunks...
#> Chunk #1 completed (n = 2; 0.2 secs)
#> Done in 0.4 secs; mean/chunk: 0.2 secs; mean/symbol: 0.19 secs.
y
#> Simple feature collection with 1 feature and 3 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -120.8787 ymin: 37.47041 xmax: -119.9521 ymax: 38.30401
#> Geodetic CRS:  WGS 84
#>              mukey   areasymbol nationalmusym                           geom
#> 1 2440240, 2924995 CA630, CA649         2mx8f POLYGON ((-119.9521 37.4998...

# note that the convex hull includes areas that are part of both constituent mukey bboxes
plot(y$geom, add=TRUE, col="RED")

# aggregation methods apply equivalently to the sapolygon geometries
z <- fetchSDA_spatial(
  x = c("CA740", "CA732", "CA802"),
  by.col = "areasymbol",
  geom = "sapolygon",
  method = "convexhull"
)
#> Using 1 chunks...
#> Chunk #1 completed (n = 3; 0.3 secs)
#> Done in 0.4 secs; mean/chunk: 0.3 secs; mean/symbol: 0.13 secs.
plot(z)

# using method="union" you can get 3 multipart survey areas returned as 3 MULTIPOLYGON
z <- fetchSDA_spatial(
  x = c("CA740", "CA732", "CA802"),
  by.col = "areasymbol",
  geom = "sapolygon",
  method = "union"
)
#> Using 1 chunks...
#> Chunk #1 completed (n = 3; 3.1 secs)
#> Done in 3.2 secs; mean/chunk: 3.1 secs; mean/symbol: 1.07 secs.
z
#> Simple feature collection with 3 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -119.3903 ymin: 36.03901 xmax: -117.7945 ymax: 38.2094
#> Geodetic CRS:  WGS 84
#>    lkey areasymbol                           geom
#> 1 14164      CA732 MULTIPOLYGON (((-118.8043 3...
#> 2 14165      CA740 MULTIPOLYGON (((-119.248 37...
#> 3 14244      CA802 MULTIPOLYGON (((-119.1054 3...
dylanbeaudette commented 1 year ago

The following map unit keys are associated with very complex polygons: ('1598653', '1598670', '162786', '1413444', '403308', '2605765', '161428', '808535').

Here are some stats / point-on-surface coordinates for some delineations tied to these map unit keys. Anything with a fractal dimension ("fd") > 1.3 is either very complex, or some kind of geometric error (e.g. sliver polygon).

 areasymbol |  mukey  | ogc_fid  |        fd         |       st_x        |       st_y
------------+---------+----------+-------------------+-------------------+------------------
 in069      | 161428  |  9937163 |  1.48328711710149 |  -85.416953776054 | 40.7499389755057
 in069      | 161428  |  9948437 |  1.46487512704564 |  -85.482530692046 | 40.7068578216985
 la075      | 1598653 | 12399549 |  1.46163071003991 | -89.9114833563087 | 29.4975338125364
 la075      | 1598653 | 12390850 |  1.45988530180611 | -89.4456791098703 | 29.1909038539091
 in069      | 161428  |  9947035 |  1.45958506507375 | -85.4776080088492 | 40.9512008459263
 la057      | 808535  | 12358142 |  1.44351990633671 | -90.9052395795618 | 29.7535805073645
 la075      | 1598653 | 12399094 |   1.4377956041352 | -89.4571619816923 | 29.1988396856297
 ia021      | 403308  |  4957979 |  1.43676407697544 | -95.0402802568294 | 42.7394000845724
 la057      | 808535  | 12357522 |  1.43548782699027 | -90.5702108833809 | 29.7445771122274
 in069      | 161428  |  9937364 |  1.42887746607053 |  -85.600613115432 | 40.9457825632656
 in069      | 161428  |  9936537 |  1.42302224162814 |  -85.640465571175 | 40.9778259842066
 la057      | 808535  | 12358490 |  1.42189171043039 | -90.5196409691703 | 29.6089347799457
 la057      | 808535  | 12358986 |  1.41932662856896 | -90.2435096155968 | 29.4243124877804
 la057      | 808535  | 12356857 |  1.40927152230042 | -90.5504866246769 | 29.7146190633603
 la075      | 1598653 | 12403712 |  1.40641834934647 |  -89.617586937944 | 29.3005355683554
 la057      | 808535  | 12357703 |  1.40373102348149 | -90.1078195510741 | 29.4591174179254
 la057      | 808535  | 12357687 |  1.40208424133049 | -90.5174479199878 | 29.7147878087805
 in069      | 161428  |  9938261 |  1.40206214445821 | -85.4262303400103 |   40.86742435906
 la057      | 808535  | 12355116 |  1.40001693367425 |  -90.596932159237 | 29.6824435958016
 la057      | 808535  | 12355102 |  1.39822321741683 | -90.6061649657523 | 29.6832350107081
 in069      | 161428  |  9943436 |  1.38400867153309 | -85.5537482314007 | 40.9908107561344
brownag commented 1 year ago

Thanks for pointing out those mukeys, @dylanbeaudette!

I think the fractal dimension was not the issue here (for the new aggregation queries) but rather the sheer number of polygons and the clumsy way I was aggregating multiple mukey/areasymbol per nationalmusym. This works now, and will generalize for full add.fields support, but I may opt to do this aggregation in a more efficient way.

I've pushed a fix (https://github.com/ncss-tech/soilDB/pull/299/commits/368c72722add749b5776bf4b772c24cef0f27efd), and now I am able to do e.g. convex hull of your problem mukeys almost 100x faster than the full geometries. e.g.

suppressPackageStartupMessages(library(soilDB))

x <- c('1598653', '1598670', '162786', '1413444', '403308', '2605765', '161428', '808535')

y <- fetchSDA_spatial(x, method = "convexhull")
#> Using 1 chunks...
#> Chunk #1 completed (n = 8; 4.1 secs)
#> Done in 4.1 secs; mean/chunk: 4.1 secs; mean/symbol: 0.51 secs.

plot(y$geom)


z <- fetchSDA_spatial(x)
#> Using 1 chunks...
#> Chunk #1 completed (n = 8; 3 secs)
#> Done in 2.96 mins; mean/chunk: 177.9 secs; mean/symbol: 22.24 secs.
dylanbeaudette commented 1 year ago

Nice. having these generalization tools easily available is a great addition.