mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/
GNU General Public License v3.0
24 stars 7 forks source link

improve speed of GFW calculations #225

Closed goergen95 closed 5 months ago

goergen95 commented 6 months ago

This PR reworks the GFW routines and improves the speed of calculation up to 10 times by introducing two main changes. The first is that exactectractr is now required for the indicator calculations. It is still included in SUGGESTS, but users are informed to install it in case requireNamespace() returns FALSE.

The main speed-improvment, however, is achieved by relying on landscapemetrics::get_patches() instead of terra::patches(). Thus, landscapemetrics is now also included in SUGGESTS, but the function code wil fallback to terra if it is not installed and issue a message to the user advising to install landscapemetrics for better computation times.

fBedecarrats commented 6 months ago

Oh, man! That looks awesome!!!!

codecov[bot] commented 6 months ago

Codecov Report

Attention: 33 lines in your changes are missing coverage. Please review.

Comparison is base (5b361b1) 76.01% compared to head (ade94a4) 75.33%.

:exclamation: Current head ade94a4 differs from pull request most recent head e5cdfdf. Consider uploading reports for the commit e5cdfdf to get more accurate results

Files Patch % Lines
R/calc_treecover_area.R 83.33% 16 Missing :warning:
R/calc_treecover_area_and_emissions.R 79.41% 7 Missing :warning:
R/calc_treecoverloss_emissions.R 78.26% 5 Missing :warning:
R/utils.R 16.66% 5 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #225 +/- ## ========================================== - Coverage 76.01% 75.33% -0.68% ========================================== Files 49 49 Lines 1926 1845 -81 ========================================== - Hits 1464 1390 -74 + Misses 462 455 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

goergen95 commented 6 months ago

Just to provide some evidence, here is a somewhat extrem use case. I simply took the bounding box of a large PA in Brazil (WDPAID = 33613) and compared the two routines.

Results: The new routine is 51 times faster on my machine for this AOI and the difference in the treecover estimation is about 0.5% (3543 ha)! :tada:

Current routine:

remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "main")
#> Skipping install of 'mapme.biodiversity' from a github remote, the SHA1 (7b0a2173) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.8.1, PROJ 9.1.1; sf_use_s2() is TRUE
library(mapme.biodiversity)

aoi <- "POLYGON ((-62.2827 -13.5376, -61.1715 -13.5376, -61.1715 -12.8, -62.2827 -12.8, -62.2827 -13.5376))"
aoi <- st_as_sfc(aoi, crs = st_crs("EPSG:4326")) |> st_as_sf()
area <- st_area(aoi)
units::set_units(area, "km²")
#> 9867.781 [km²]

outdir <- tempfile()
dir.create(outdir)
aoi <- init_portfolio(aoi, years = 2000:2022, outdir = outdir)

aoi <- get_resources(aoi, resources = c("gfw_treecover", "gfw_lossyear"),
                     vers_treecover = "GFC-2022-v1.10",
                     vers_lossyear = "GFC-2022-v1.10")
#> Starting process to download resource 'gfw_treecover'........
#> Starting process to download resource 'gfw_lossyear'........

timing <- system.time(aoi <- calc_indicators(aoi, "treecover_area"))
#> Argument 'min_size' for resource 'treecover_area' was not specified. Setting to default value of '10'.
#> Argument 'min_cover' for resource 'treecover_area' was not specified. Setting to default value of '35'.
aoi$treecover_area[[1]]
#> # A tibble: 23 × 2
#>    years treecover
#>    <int>     <dbl>
#>  1  2000   610823.
#>  2  2001   610089.
#>  3  2002   608337.
#>  4  2003   600961.
#>  5  2004   597900.
#>  6  2005   592184.
#>  7  2006   590614.
#>  8  2007   589804.
#>  9  2008   588828.
#> 10  2009   588093.
#> # ℹ 13 more rows
timing
#>     user   system  elapsed 
#> 1065.052   17.670 1082.735

Created on 2023-12-19 with reprex v2.0.2

New routine:

remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "speed-up-gfw-routines")
#> Skipping install of 'mapme.biodiversity' from a github remote, the SHA1 (2bb12db1) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.8.1, PROJ 9.1.1; sf_use_s2() is TRUE
library(mapme.biodiversity)

aoi <- "POLYGON ((-62.2827 -13.5376, -61.1715 -13.5376, -61.1715 -12.8, -62.2827 -12.8, -62.2827 -13.5376))"
aoi <- st_as_sfc(aoi, crs = st_crs("EPSG:4326")) |> st_as_sf()
area <- st_area(aoi)
units::set_units(area, "km²")
#> 9867.781 [km²]

outdir <- tempfile()
dir.create(outdir)
aoi <- init_portfolio(aoi, years = 2000:2022, outdir = outdir)

aoi <- get_resources(aoi, resources = c("gfw_treecover", "gfw_lossyear"),
                     vers_treecover = "GFC-2022-v1.10",
                     vers_lossyear = "GFC-2022-v1.10")
#> Starting process to download resource 'gfw_treecover'........
#> Starting process to download resource 'gfw_lossyear'........

timing <- system.time(aoi <- calc_indicators(aoi, "treecover_area"))
#> Argument 'min_size' for resource 'treecover_area' was not specified. Setting to default value of '10'.
#> Argument 'min_cover' for resource 'treecover_area' was not specified. Setting to default value of '35'.
aoi$treecover_area[[1]]
#> # A tibble: 23 × 2
#>    years treecover
#>    <int>     <dbl>
#>  1  2000   614366.
#>  2  2001   613628.
#>  3  2002   611866.
#>  4  2003   604446.
#>  5  2004   601365.
#>  6  2005   595615.
#>  7  2006   594036.
#>  8  2007   593221.
#>  9  2008   592239.
#> 10  2009   591500.
#> # ℹ 13 more rows
timing
#>    user  system elapsed 
#>  16.653   4.430  21.083

Created on 2023-12-19 with reprex v2.0.2

goergen95 commented 6 months ago

Here is another comparison when using a grid. Results indicate that processing time is reduced by half, differences in area estiamation for grid cell number 6 is about 1.7% (8ha).

Current routine:

remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "main")
#> Skipping install of 'mapme.biodiversity' from a github remote, the SHA1 (7b0a2173) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.8.1, PROJ 9.1.1; sf_use_s2() is TRUE
library(future)
library(progressr)
library(mapme.biodiversity)

aoi <- "POLYGON ((-62.2827 -13.5376, -61.1715 -13.5376, -61.1715 -12.8, -62.2827 -12.8, -62.2827 -13.5376))"
aoi <- st_as_sfc(aoi, crs = st_crs("EPSG:4326")) |> st_as_sf()
aoi <- st_make_grid(aoi, cellsize = c(0.025, 0.025)) |> st_as_sf()
area <- st_area(aoi)
mean(units::set_units(area[1], "km²"))
#> 7.513411 [km²]

outdir <- tempfile()
dir.create(outdir)
aoi <- init_portfolio(aoi, years = 2000:2022, outdir = outdir)

aoi <- get_resources(aoi, resources = c("gfw_treecover", "gfw_lossyear"),
                     vers_treecover = "GFC-2022-v1.10",
                     vers_lossyear = "GFC-2022-v1.10")
#> Starting process to download resource 'gfw_treecover'........
#> Starting process to download resource 'gfw_lossyear'........

plan(multisession, workers = 6)
with_progress({
  timing <- system.time(aoi <- calc_indicators(aoi, "treecover_area"))
})
#> Argument 'min_size' for resource 'treecover_area' was not specified. Setting to default value of '10'.
#> Argument 'min_cover' for resource 'treecover_area' was not specified. Setting to default value of '35'.
plan(sequential)
aoi$treecover_area[[6]]
#> # A tibble: 23 × 2
#>    years treecover
#>    <int>     <dbl>
#>  1  2000      452.
#>  2  2001      452.
#>  3  2002      450.
#>  4  2003      450.
#>  5  2004      450.
#>  6  2005      450.
#>  7  2006      450.
#>  8  2007      450.
#>  9  2008      450.
#> 10  2009      450.
#> # ℹ 13 more rows
timing
#>    user  system elapsed 
#>   6.804   0.312 172.218

Created on 2023-12-21 with reprex v2.0.2

New routine:

remotes::install_github("mapme-initiative/mapme.biodiversity", ref = "speed-up-gfw-routines")
#> Skipping install of 'mapme.biodiversity' from a github remote, the SHA1 (2bb12db1) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.8.1, PROJ 9.1.1; sf_use_s2() is TRUE
library(future)
library(progressr)
library(mapme.biodiversity)

aoi <- "POLYGON ((-62.2827 -13.5376, -61.1715 -13.5376, -61.1715 -12.8, -62.2827 -12.8, -62.2827 -13.5376))"
aoi <- st_as_sfc(aoi, crs = st_crs("EPSG:4326")) |> st_as_sf()
aoi <- st_make_grid(aoi, cellsize = c(0.025, 0.025)) |> st_as_sf()
area <- st_area(aoi)
mean(units::set_units(area[1], "km²"))
#> 7.513411 [km²]

outdir <- tempfile()
dir.create(outdir)
aoi <- init_portfolio(aoi, years = 2000:2022, outdir = outdir)

aoi <- get_resources(aoi, resources = c("gfw_treecover", "gfw_lossyear"),
                     vers_treecover = "GFC-2022-v1.10",
                     vers_lossyear = "GFC-2022-v1.10")
#> Starting process to download resource 'gfw_treecover'........
#> Starting process to download resource 'gfw_lossyear'........

plan(multisession, workers = 6)
with_progress({
  timing <- system.time(aoi <- calc_indicators(aoi, "treecover_area"))
})
#> Argument 'min_size' for resource 'treecover_area' was not specified. Setting to default value of '10'.
#> Argument 'min_cover' for resource 'treecover_area' was not specified. Setting to default value of '35'.
plan(sequential)
aoi$treecover_area[[6]]
#> # A tibble: 23 × 2
#>    years treecover
#>    <int>     <dbl>
#>  1  2000      444.
#>  2  2001      444.
#>  3  2002      443.
#>  4  2003      443.
#>  5  2004      443.
#>  6  2005      443.
#>  7  2006      443.
#>  8  2007      443.
#>  9  2008      443.
#> 10  2009      443.
#> # ℹ 13 more rows
timing
#>    user  system elapsed 
#>   3.717   0.138  88.381

Created on 2023-12-21 with reprex v2.0.2

goergen95 commented 5 months ago

Thanks, I included some styling and linting. Anyway, before merging, I think we should disable the tests for numerical stability on case {landscapemetrics} is not installed.

goergen95 commented 5 months ago

Something wrong with Posits package management servers (see here). Will re-run checks once this has settled.

karpfen commented 5 months ago

Something wrong with Posits package management servers (see here). Will re-run checks once this has settled.

FYI, it seems to be fixed now: https://fosstodon.org/@jvroberts/111773434263658017

goergen95 commented 5 months ago

Yep, thanks. Already re-run the checks and it seems to pass. I included now some serious refactoring and it makes the code so much more readable. You might want to take another look? Still thinking about also refactoring the associated tests quiet a bit..

goergen95 commented 5 months ago

Tests now also refactored, I would be happy with merging this.

karpfen commented 5 months ago

@goergen95 What do you think of replacing the installation instructions in the README with remotes::install_github("https://github.com/mapme-initiative/mapme.biodiversity", dependencies = TRUE), resp. install.packages("mapme.biodiversity", dependencies = TRUE)?

That way the landscapemetrics dependency may become a bit more accessible.

goergen95 commented 5 months ago

Nice! That was definitely one of the headaches I still had with merging this. 😉

goergen95 commented 5 months ago

Btw I actually see the possibility to use landscapemetrics for, well, landscape metrics indicators in the future. Once we get a user request for this we might actually implement the respective indicator functions.