Closed kongdd closed 3 years ago
We can't support weights
for a self-defined function with the same flexibility as with the built-in operations, where the value raster and weighting raster can have different extents and resolutions.
For self-defined functions, I'm not sure whether it's better to support weights
for the special case where the value raster and weighting raster have identical resolutions/extents, or to just throw an error that the argument is unsupported. Supporting weights
in this limited way would really just be sugar for making a stack of the value and weighting rasters, like the README example.
weights
Because the grid_area is smaller when approaching the Polar region. It is common to use fraction*grid_area
as the weight. I think it is necessary to support weights
in the self defined function.
exactextractr:::CPP_exact_extract
I think the step of exactextractr:::CPP_exact_extract
should be separated from the exact_extract
function.
Asuming I have a list of list(r, r, r, r, ..., r), with the length of 10, r is the raster of different years. In this way, exactextractr:::CPP_exact_extract
need to run 10 times. Actually, one time is enough. Meanwhile, exactextractr:::CPP_exact_extract
is the most time-consuming function in the exact_extract
.
Hence, by separating exactextractr:::CPP_exact_extract
from the exact_extract
function, exact_extract
can boost another dozens of times in the speed.
This is my solution in the R language, https://github.com/kongdd/extract2/blob/master/R/extract2.R. In my situation, I need to clip 15 yearly tif images by global 9500 basins. extract2
works and speed up about 10 times in my application.
extract2
just separate exactextractr:::CPP_exact_extract
from the exact_extract
function. Only 100 lines of code.
Hope it can be a part of exact_extract
.
Thanks for sharing this. Right now the reuse of the coverage fractions (output of CPP_exact_extract
) is possible in exactextractr
by making a stack of the inputs. Area weights can be incorporated by adding grid area to the stack.
Using the data from the README:
z <- data.frame(t(exact_extract(stack(prec, area(prec)),
brazil[1:5, ],
function(x, c) {
area <- x[, 13]
apply(x[, 1:nlayers(prec)], 2, function(y) weighted.mean(y, c*area))
})))
This is ugly, and I'm not saying it's a good solution. But it is possible. To clean this up, I think you're really getting at two issues:
1) How can we apply the same function to every layer in an input, without doing apply
ugliness like above? (This is currently supported for the named summary operations). Supporting it for R functions could be as simple as adding a flag to do this instead of passing a dataframe with all stack values to the R function.
2) How can we pass weights to an R function? As I mentioned above, that would be easy as long as we accept the restriction that the weights must have the same extent/resolution as the values. For the area case you're describing, that's certainly going to be true.
Thanks for your reply. I will try the performance of stack
function.
But I am confused about those two issues.
As my problem can be solved by extract2
, so we can close this issue now.
I'll keep it open as an enhancement request. I've largely implemented (1) here: https://gitlab.com/isciences/exactextractr/-/merge_requests/40/diffs#860fa0c79948a18338df5a974ebaaceb12128a00_308_325
(Un)fortunately there are a lot of different argument combinations for exact_extract
, and I need to spend some more time making sure all appropriate combinations have tests in place.
Point 1 above was resolved with 2a5182dcf4ca6b87b311a18fc3512f0f26211ee1; still need to handle Point 2.
@kongdd , I have a branch that supports passing weights
to a self-defined function, using any combination of RasterLayer
and RasterStack
for values and weights. The pixel coverage calculations (CPP_exact_extract
) are reused throughout the stack. If you would like to test it, you can install it via
remotes::install_git('https://gitlab.com/isciences/exactextractr', ref='rfunc-weights')
An example demonstrating both features using the README data is
mean_prec <- exact_extract(prec, brazil,
function(value, cov_frac, weight) {
weighted.mean(value, cov_frac * weight)
},
weights = area(prec),
stack_apply = TRUE,
append_cols = c('GID_2', 'NAME_2'))
good job. Thank you
Resolved by eeea9ea328b7fc5a356b27f612f212951c9a2bce
Previous version looks much suitable for me, which is more computing efficiency in my case.
Any suggestion how to make CPP_exact_extract
return row, col, nrow like previous version?
area <- raster::area(r)
geoms %<>% check_wkb() # convert to wkbs
blocks <- llply(geoms, function(wkb) {
ret <- exactextractr:::CPP_exact_extract(r, wkb = wkb)
names(ret)[3] <- "fraction"
dim <- dim(ret$fraction)
ret$nrow <- dim[1]
ret$ncol <- dim[2]
ret$fraction <- as.vector(t(ret$fraction))
ret$area <- raster::getValuesBlock(area,
row = ret$row,
col = ret$col,
nrow = ret$nrow, ncol = ret$ncol
)
ret
}, .progress = "text")
Call for keeping the interface of previous version CPP_exact_extract
.
weights
is ignored, iffun
is a self defined function. https://github.com/isciences/exactextractr/blob/master/R/exact_extract.R#L293