mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/
GNU General Public License v3.0
24 stars 7 forks source link

`calc_treecover_area` returning unexpected empty tibble #255

Closed karpfen closed 1 month ago

karpfen commented 2 months ago

I was processing a bunch of polygons to calculate the GFW coverage and had quite a lot of missing values in the results. I'm not sure what caused this or if it is entirely reproducible, because when I tried to reproduce the error, for some regions I got sensible results when I processed them one-by-one.

I put togehter a gpkg of regions for which I encountered the problem consistently when I run it like this (I commented out the parallelization stuff, it didn't change the behavior):

library(sf)
library(mapme.biodiversity)

regions <- read_sf("reprex_regions.gpkg")

all(st_is_valid(regions))
all(st_geometry_type(regions) == "POLYGON")

datadir <- "C:/data/mapme.biodiversity/"
mapme_options(outdir = datadir, verbose = TRUE)

min_cover <- 1
min_size <- 1
t10 <- 2012
t5 <- 2017
t0 <- 2022

regions_pf <- regions %>%
  get_resources(
    get_gfw_treecover("GFC-2022-v1.10"),
    get_gfw_lossyear("GFC-2022-v1.10")
  )

#plan(multisession, workers = min(ncores, availableCores("multisession")))
#with_progress({
  tc <- calc_indicators(regions_pf,
                        calc_treecover_area(years = c(t10,t5,t0),
                                            min_size = min_size,
                                            min_cover = min_cover))
#})
#plan(sequential)
  # Found a column named 'assetid'. Overwritting its values with a unique identifier.
  # Warning messages:
  #   1: In .check_single_asset(result, i) :
  #   At asset 1 an error occured. Returning NA.
  # Error in df[[what]][index] <- as.numeric(my_sum) :
  #   NAs are not allowed in subscripted assignments
  #
  # 2: In .check_single_asset(result, i) :
  #   At asset 2 an error occured. Returning NA.
  # Error in df[[what]][index] <- as.numeric(my_sum) :
  #   NAs are not allowed in subscripted assignments
  #
  # 3: In .check_single_asset(result, i) :
  #   At asset 3 an error occured. Returning NA.
  # Error in eval(expr, p) : std::bad_alloc
  #
  # 4: In .check_single_asset(result, i) :
  #   At asset 4 an error occured. Returning NA.
  # Error in df[[what]][index] <- as.numeric(my_sum) :
  #   NAs are not allowed in subscripted assignments

tc$treecover_area [[1]]
 ## A tibble: 1 × 1
 #  value
 #  <lgl>
 # 1 NA   

Could you please have a look at this @goergen95 ?

karpfen commented 2 months ago

Data + script here: reprex_error.zip

goergen95 commented 2 months ago

Processing this areas results in very large rasters (see below) which do not fit in memory on my machine. Maybe subdivide into smaller chunks and take the sum? Though this error message looks like an error that needs catching. Will look into it tomorrow.

  # Error in df[[what]][index] <- as.numeric(my_sum) :
  #   NAs are not allowed in subscripted assignments

[[1]]
class       : SpatRaster 
dimensions  : 22631, 10821, 1  (nrow, ncol, nlyr)
resolution  : 0.00025, 0.00025  (x, y)
extent      : 39.64975, 42.355, 8.8405, 14.49825  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : vrt_Hansen_GFC-2022-v1.10_treecover2000_10N_030E156b53161f58 
name        : Hansen_GFC-2022-v1.10_treecover2000_10N_030E 
min value   :                                            0 
max value   :                                           95 

[[2]]
class       : SpatRaster 
dimensions  : 20216, 19822, 1  (nrow, ncol, nlyr)
resolution  : 0.00025, 0.00025  (x, y)
extent      : 35.257, 40.2125, 8.71475, 13.76875  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : vrt_Hansen_GFC-2022-v1.10_treecover2000_10N_030E156b393b6a54 
name        : Hansen_GFC-2022-v1.10_treecover2000_10N_030E 
min value   :                                            0 
max value   :                                          100 

[[3]]
class       : SpatRaster 
dimensions  : 27510, 35363, 1  (nrow, ncol, nlyr)
resolution  : 0.00025, 0.00025  (x, y)
extent      : 34.13925, 42.98, 3.5095, 10.387  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : vrt_Hansen_GFC-2022-v1.10_treecover2000_10N_030E156b1c6509cd 
name        : Hansen_GFC-2022-v1.10_treecover2000_10N_030E 
min value   :                                            0 
max value   :                                          100 

[[4]]
class       : SpatRaster 
dimensions  : 16075, 17020, 1  (nrow, ncol, nlyr)
resolution  : 0.00025, 0.00025  (x, y)
extent      : 34.877, 39.132, 4.439, 8.45775  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : vrt_Hansen_GFC-2022-v1.10_treecover2000_10N_030E156b19830fd7 
name        : Hansen_GFC-2022-v1.10_treecover2000_10N_030E 
min value   :                                            0 
max value   :                                          100 
goergen95 commented 2 months ago

Could you please try with the improve-gfw branch? Should be slightly more efficient, though you would run into memory issues eventually if you increase the size of your polygons. I will investigate the possibility of automated chunking, but that is the topic for another issue.

karpfen commented 2 months ago

Cool, I'm currently running this, thanks!

goergen95 commented 1 month ago

Chunking for large assets now implemented and available on main via d72e37e955991a6fe7b3efc9f1f526449c24a59c thus closing. Feel free to re-open if you still encounter issues.