Closed goergen95 closed 4 months ago
I get a rough ~50% decrease in average computation time with:
#remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "main")
#remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "improve-runtime-for-small-assets")
library(mapme.biodiversity)
library(sf)
library(microbenchmark)
outdir <- file.path(tempdir(), "mapme.data")
dir.create(outdir)
mapme.biodiversity:::.copy_resource_dir(outdir)
x <- read_sf(
system.file("extdata", "gfw_sample.gpkg", package = "mapme.biodiversity")
)
mapme_options(
outdir = outdir,
verbose = TRUE
)
x <- get_resources(
x,
get_gfw_treecover(version = "GFC-2020-v1.8"),
get_gfw_lossyear(version = "GFC-2020-v1.8")
)
# call once to load namespace
calc_indicators(
x,
calc_treecover_area(
years = 2000:2005, min_size = 5, min_cover = 30)
)
microbenchmark(
branch = {
calc_indicators(
x,
calc_treecover_area(
years = 2000:2005, min_size = 5, min_cover = 30)
)}
)
I get for main (d94fad068):
Unit: milliseconds
expr min lq mean median uq max neval
branch 384.5842 388.6725 392.8769 390.0362 393.276 598.3322 100
versus:
Unit: milliseconds
expr min lq mean median uq max neval
branch 159.6032 171.4363 182.1979 175.1873 180.3777 482.6047 100
@karpfen: Would you mind to confirm?
Adapting above script to process 100 assets via:
mapme_options(
outdir = outdir,
verbose = FALSE
)
x <- get_resources(
x,
get_gfw_treecover(version = "GFC-2020-v1.8"),
get_gfw_lossyear(version = "GFC-2020-v1.8")
)
x <- st_as_sf(list_rbind(lapply(1:100, function(i) x)))
x$assetid <- 1:nrow(x)
# call once to load namespace
microbenchmark(
branch = {
calc_indicators(
x,
calc_treecover_area(
years = 2000:2005, min_size = 5, min_cover = 30)
)},
times = 10
)
yields on main
Unit: seconds
expr min lq mean median uq max neval
branch 17.42098 17.43061 17.6243 17.53582 17.68839 18.15672 10
vs:
Unit: seconds
expr min lq mean median uq max neval
branch 8.81958 8.903752 9.088585 9.049905 9.153315 9.738127 10
Btw. x
has roughly 2,300 ha, that is larger than 80% of the assets in the portfolio we were discussing today.
Wow, I didn't expect that to work so quickly, nice one :) I'll try this out later today
Would be nice if you could run the example code to see if the improvement holds on another machine. Note, that speed up in a real use-case will be very much dependent on both the structure of the assets (e.g. are the many multi-polygons) and how you set up parallelization (number of cores on the asset level vs. on the chunk level -> parts of multi-polygons are also processed as chunks).