isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
272 stars 26 forks source link

Small overhead for `{terra}` objects #81

Closed kadyb closed 2 years ago

kadyb commented 2 years ago

In the new version of {exactextractr} there is likely a small overhead for {terra} objects where {raster} backend is marginally faster. In the previous version the times are very close. Please close this issue if the difference is irrelevant.

library("sf")
library("terra")
library("raster")
library("exactextractr")

# vector data (10193 features)
brazil = read_sf("https://geodata.ucdavis.edu/gadm/gadm4.0/kmz/gadm40_BRA_3.kmz")

# raster data (4320 x 8640 pixels, 12 layers)
download.file("https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_2.5m_prec.zip",
              "prec_2.5m.zip")
unzip("prec_2.5m.zip", exdir = "prec_2.5m")
f = list.files("prec_2.5m", pattern = "\\.tif$", full.names = TRUE)

### terra backend ###
prec = rast(f)
prec = prec * 1

system.time(
  exact_extract(prec, brazil, fun = "mean", progress = FALSE)
)

## 0.8.2
#>   user  system elapsed
#> 17.164   0.692  17.857

## 0.7.2
#>   user  system elapsed
#> 15.138   0.021  15.160

### raster backend ###
prec = stack(f)
prec = readAll(prec)

system.time(
  exact_extract(prec, brazil, fun = "mean", progress = FALSE)
)

## 0.8.2
#>   user  system elapsed
#> 14.707   0.023  14.732

## 0.7.2
#>   user  system elapsed
#> 14.709   0.009  14.719
dbaston commented 2 years ago

The difference in this case may be that the terra version of prec is not actually loaded into memory:

> ### terra backend ###
> prec = rast(f)
> prec = prec * 1

> terra::inMemory(prec[[1]])
[1] FALSE

while the raster version is:

> prec = stack(f)
> prec = readAll(prec)
> raster::inMemory(prec)
[1] TRUE

This alternative seems to force terra to bring things into memory:

> prec = rast(f)
> values(prec) <- values(prec)
> terra::inMemory(prec)
[1] TRUE
kadyb commented 2 years ago

Surprisingly, I see it (terra v.1.5.34):

prec = rast(f)
prec = prec * 1
terra::inMemory(prec)
#> [1] TRUE
terra::inMemory(prec[[1]])
#> [1] TRUE

Here we can notice some minimal differences too:

prec = rast(f)
values(prec) = values(prec)
system.time(exact_extract(prec, brazil, fun = "mean", progress = FALSE))
#>   user  system elapsed
#> 16.276   0.345  16.623

prec = stack(f)
prec = readAll(prec)
system.time(exact_extract(prec, brazil, fun = "mean", progress = FALSE))
#>   user  system elapsed
#> 14.321   0.015  14.339
kadyb commented 2 years ago

You probably don't have enough RAM to load this data:

prec = rast(f)
prec = prec * 1
inMemory(prec)
#> [1] TRUE

terraOptions(memfrac = 0.1)
prec = rast(f)
prec = prec * 1
inMemory(prec)
#> [1] FALSE
dbaston commented 2 years ago

This is an interesting benchmark in that the runtime seems dominated by ... terra::ext and terra::res ?

If you'd like to try a tentative fix you can do

remotes::install_git('https://gitlab.com/isciences/exactextractr', ref='reduce-terra-oh')
kadyb commented 2 years ago

Great, I see significant speedup with this PR on my PC.

## this PR
#>    user  system elapsed 
#>   5.555   0.017   5.572

## 0.8.2
#>   user  system elapsed
#> 17.164   0.692  17.857

## 0.7.2
#>   user  system elapsed
#> 15.138   0.021  15.160

For context, I first noticed this overhead on smaller dataset when updating {exactextractr} from 0.7.2 to 0.8.2 and {terra} from 1.5.17 to 1.5.34.

# median of 10 iterations
exactextractr (raster):  2.88 s -> 2.98 s
exactextractr (terra):  2.97 s -> 5.03 s
dbaston commented 2 years ago

Thanks for testing this and sharing the benchmark. Different usages can have really different performance characteristics (raster v terra, GeoTIFF vs netCDF, small polygons vs large polygons, single rasters vs stacks, different chunking schemes, etc.) so it can be easy for regressions to sneak in.

The fix depends on some unrelated work, so I'll merge it in when that is completed.

dbaston commented 2 years ago

Addressed with 5b88ecf7ecfa2169a6965d6ca5bc

kadyb commented 2 years ago

Thank you!