isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
274 stars 26 forks source link

`weights` is ignored, if `fun` is a self defined function. #36

Closed kongdd closed 3 years ago

kongdd commented 3 years ago

weights is ignored, if fun is a self defined function. https://github.com/isciences/exactextractr/blob/master/R/exact_extract.R#L293

dbaston commented 3 years ago

We can't support weights for a self-defined function with the same flexibility as with the built-in operations, where the value raster and weighting raster can have different extents and resolutions.

For self-defined functions, I'm not sure whether it's better to support weights for the special case where the value raster and weighting raster have identical resolutions/extents, or to just throw an error that the argument is unsupported. Supporting weights in this limited way would really just be sugar for making a stack of the value and weighting rasters, like the README example.

kongdd commented 3 years ago
  1. weights Because the grid_area is smaller when approaching the Polar region. It is common to use fraction*grid_area as the weight. I think it is necessary to support weights in the self defined function.

  2. exactextractr:::CPP_exact_extract I think the step of exactextractr:::CPP_exact_extract should be separated from the exact_extract function. Asuming I have a list of list(r, r, r, r, ..., r), with the length of 10, r is the raster of different years. In this way, exactextractr:::CPP_exact_extract need to run 10 times. Actually, one time is enough. Meanwhile, exactextractr:::CPP_exact_extract is the most time-consuming function in the exact_extract.

Hence, by separating exactextractr:::CPP_exact_extract from the exact_extract function, exact_extract can boost another dozens of times in the speed.

This is my solution in the R language, https://github.com/kongdd/extract2/blob/master/R/extract2.R. In my situation, I need to clip 15 yearly tif images by global 9500 basins. extract2 works and speed up about 10 times in my application. extract2 just separate exactextractr:::CPP_exact_extract from the exact_extract function. Only 100 lines of code.

Hope it can be a part of exact_extract.

dbaston commented 3 years ago

Thanks for sharing this. Right now the reuse of the coverage fractions (output of CPP_exact_extract) is possible in exactextractr by making a stack of the inputs. Area weights can be incorporated by adding grid area to the stack.

Using the data from the README:

z <- data.frame(t(exact_extract(stack(prec, area(prec)),
                   brazil[1:5, ],
                   function(x, c) {
                     area <- x[, 13] 
                     apply(x[, 1:nlayers(prec)], 2, function(y) weighted.mean(y, c*area))
                   })))

This is ugly, and I'm not saying it's a good solution. But it is possible. To clean this up, I think you're really getting at two issues:

1) How can we apply the same function to every layer in an input, without doing apply ugliness like above? (This is currently supported for the named summary operations). Supporting it for R functions could be as simple as adding a flag to do this instead of passing a dataframe with all stack values to the R function.

2) How can we pass weights to an R function? As I mentioned above, that would be easy as long as we accept the restriction that the weights must have the same extent/resolution as the values. For the area case you're describing, that's certainly going to be true.

kongdd commented 3 years ago

Thanks for your reply. I will try the performance of stack function.

But I am confused about those two issues. As my problem can be solved by extract2, so we can close this issue now.

dbaston commented 3 years ago

I'll keep it open as an enhancement request. I've largely implemented (1) here: https://gitlab.com/isciences/exactextractr/-/merge_requests/40/diffs#860fa0c79948a18338df5a974ebaaceb12128a00_308_325

(Un)fortunately there are a lot of different argument combinations for exact_extract, and I need to spend some more time making sure all appropriate combinations have tests in place.

dbaston commented 3 years ago

Point 1 above was resolved with 2a5182dcf4ca6b87b311a18fc3512f0f26211ee1; still need to handle Point 2.

dbaston commented 3 years ago

@kongdd , I have a branch that supports passing weights to a self-defined function, using any combination of RasterLayer and RasterStack for values and weights. The pixel coverage calculations (CPP_exact_extract) are reused throughout the stack. If you would like to test it, you can install it via

remotes::install_git('https://gitlab.com/isciences/exactextractr', ref='rfunc-weights')

An example demonstrating both features using the README data is

mean_prec <- exact_extract(prec, brazil, 
                           function(value, cov_frac, weight) {
                             weighted.mean(value, cov_frac * weight)
                           },
                           weights = area(prec), 
                           stack_apply = TRUE,
                           append_cols = c('GID_2', 'NAME_2'))
kongdd commented 3 years ago

good job. Thank you

dbaston commented 3 years ago

Resolved by eeea9ea328b7fc5a356b27f612f212951c9a2bce

kongdd commented 3 years ago

Previous version looks much suitable for me, which is more computing efficiency in my case. Any suggestion how to make CPP_exact_extract return row, col, nrow like previous version?

area <- raster::area(r)
geoms %<>% check_wkb() # convert to wkbs

blocks <- llply(geoms, function(wkb) {
    ret <- exactextractr:::CPP_exact_extract(r, wkb = wkb)
    names(ret)[3] <- "fraction"
    dim <- dim(ret$fraction)
    ret$nrow <- dim[1]
    ret$ncol <- dim[2]
    ret$fraction <- as.vector(t(ret$fraction))
    ret$area <- raster::getValuesBlock(area,
        row = ret$row,
        col = ret$col,
        nrow = ret$nrow, ncol = ret$ncol
    )
    ret
}, .progress = "text")

Call for keeping the interface of previous version CPP_exact_extract.