ncss-tech / soilReports

An R package that assists with the setup and operation of a collection of soil data summary, comparison, and evaluation reports. These reports are primarily used by USDA-NRCS soil scientists in both initial and update mapping.
15 stars 5 forks source link

consider exactextractr::exact_extract for raster extraction #104

Open brownag opened 4 years ago

brownag commented 4 years ago

This is a note to consider use of exactextractr package for faster raster extraction. I know we are expecting terra to be available in the near future, not sure how the performance of these will compare. My preliminary tests show extraction with terra operating at similar speeds to raster, maybe a little faster.

It seems exactextractr::exact_extract is orders of magnitude faster than raster::extract. It is available on CRAN and relies on GEOS.

Here is a small example showing processing of a PRISM raster. Though, I think the performance enhancement really shows when using more detailed rasters.

library(raster)
library(exactextractr)
library(tigris)
library(terra)

# extraction of all 800m cells from a county
ca <- tigris::counties(state="ca", class="sf")
calaveras <- ca[ca$COUNTYFP == "009",]

ra1 <- raster::raster("/data/geodata/MUSum_PRISM/final_MAP_mm_800m.tif")

system.time(r1 <- raster::extract(ra1, calaveras))

system.time(t1 <- terra::extract(ra1, calaveras))

system.time(x1 <- exactextractr::exact_extract(ra1, calaveras))

system.time(r1 <- raster::extract(ra1, calaveras)) user system elapsed 3.493 0.173 3.792 system.time(t1 <- terra::extract(ra1, calaveras)) user system elapsed 2.281 0.251 2.606 system.time(x1 <- exactextractr::exact_extract(ra1, calaveras)) user system elapsed 0.586 0.000 0.734 Warning message: In .local(x, y, ...) : Polygons transformed from EPSG:4269 to EPSG:4269

Extraction of a small (10m) buffer around several point locations:

# extraction at/aruond points (investigation)
library(aqp)
library(soilDB)
library(sf)

data("loafercreek")
coordinates(loafercreek) <- ~ x_std + y_std
proj4string(loafercreek) <- "+proj=longlat +datum=WGS84"
ra2 <- raster("/data/geodata/ssro2_ann_beam_rad_int.tif")
loaf.spdf <- spTransform(as(loafercreek, 'SpatialPointsDataFrame'), 
                         CRS(proj4string(ra2)))

# create 10m buffer around points, no point extraction method in exact_extract
losp10 <- sf::st_buffer(sf::st_as_sf(loaf.spdf), dist=10)

system.time(r2 <- raster::extract(ra2, loaf.spdf))
system.time(t2 <- terra::extract(ra2, loaf.spdf))

# these take a _long_ time... n=106 tiny 20m diameter polygons
# system.time(r2 <- raster::extract(ra2, loaf.spdf, buffer=10))
# system.time(t2 <- terra::extract(ra2, loaf.spdf, buffer=10))

# in comparable time to simple point extraction, extract 10m radius 
system.time(x2 <- exactextractr::exact_extract(ra2, losp10, progress = FALSE))
x2[1:3]

# auto-weighted mean by contributing proportion ~ "point" values
system.time(x3 <- exactextractr::exact_extract(ra2, losp10, fun="mean",
                                               progress = FALSE))
x3[1:3]

Ordinary point extraction faster with raster/terra

system.time(r2 <- raster::extract(ra2, loaf.spdf)) user system elapsed 0.891 0.015 0.954 system.time(t2 <- terra::extract(ra2, loaf.spdf)) user system elapsed 0.833 0.023 0.879

But, you can create tiny polygons buffered around points and extract then nearly as fast with exactextractr. Calculating the weighted mean composition is an easy extension

in comparable time to simple point extraction, extract 10m radius:

system.time(x2 <- exactextractr::exact_extract(ra2, losp10, progress = FALSE)) user system elapsed 1.037 0.015 1.098 x2[1:3] [[1]] value coverage_fraction 1 60041 0.2454129 2 60393 0.1034935

[[2]] value coverage_fraction 1 64985 0.27749354 2 65777 0.07141285

[[3]] value coverage_fraction 1 64985 0.28094107 2 65777 0.06796531

Auto-weighted mean by contributing proportion ~ "point" values

system.time(x3 <- exactextractr::exact_extract(ra2, losp10, fun="mean",

  • progress = FALSE)) user system elapsed 0.982 0.007 1.036 x3[1:3] [1] 60145.41 65147.10 65139.28
dylanbeaudette commented 4 years ago

I like it. Anything that makes the report faster is a good thing. I would like to do some profiling on the sampling code to see if that or raster::extract is the bottle-neck. I'm not sure, but perhaps we can use this kind of sampling to assist with reasonable adjustments to an effective sampling size (e.g. #26).