Closed kongdd closed 3 years ago
Can you provide an example usage where this performs better?
In hydrological studies, huge shapefile with thousand features is quite often.
The overlap operation is the most time-consuming procedure in extract
. It is desired to separate the function of overlap and extract. I will give you an example few hours later.
Sorry about the later response.
I have 21 tif files, corresponding to yearly ET estimation during 2000-2020. The shapefile is 10631 basins in the global scale.
(a) If separate the function of overlap and extract, the running time is 140 second; (b) If not, it is 190 second.
note: raster reading costs about 64 second second.
Because overlaping operation only performed once. This make (a) is 1.66 times faster than (b) (76 seconds vs. 126 seconds). Hence, I think it is necessary to keep previous version CPP_exact_extract function.
library(exactextractr)
library(extract2)
library(raster)
library(sf)
files <- dir("H:/global_WB/ET/diagnosed_PML/v017", "ERA5.*.tif", full.names = TRUE)
shp = read_sf("N:/Research/hydro/globalRunoff/INPUT/shp/globalRunoff_GSIM_10631basins.shp")
## 1. overlap and extract separately
blocks = overlap(files[1] %>% raster(), shp, return.id = FALSE) # overlap information for each feature
system.time(lst <- st_extract(files, blocks))
# 140.97s
## 2. exactextractr way
wkbs = sf::st_as_binary(sf::st_geometry(shp), EWKB=TRUE)
system.time({
lst2 = llply(files, function(file) {
r = brick(file) %>% readAll()
exact_extract(r, shp, "mean")
})
})
# 190.0 s
system.time({
lst2 = llply(files, function(file) {
r = brick(file) %>% readAll()
# exact_extract(r, shp, "mean")
}, .progress = "text")
})
# 64 s
Why not make a RasterStack
of all the input tif
files?
RasterStack is one solution. But not the best, because each tif file already have multiple bands. Besides, read all files into R will likely induce memory problem, especially for large files. If not reading into memory, so many features, the speed is very slow.
A small action will improve the performance substantially. It worth to keep previous version.
How long does a RasterStack
take without readAll
? You might also try a terra
SpatRaster
for better performance, at least with some file formats.
Fast, 30s. But the result is incorrect. It looks only the result of the first year returned.
rs = files %>% map(rast) %>% do.call(c, .)
r = exact_extract(rs, shp, "mean")
It looks like this happens because terra
doesn't make the names unique, so the columns in the returned data frame overwrite each other. Try making the names unique yourself and give it a try.
works. Using time reduced to 33s. But the result needs further tidy.
But, I still think it worth to keep previous version, which is valuable for me. If no this function, st_extract
and all of my self defined function zonal statistic function will unable to work.
Your small action, will give me or other users big convenience. Cheers!
I'm happy to help troubleshoot performance problems and improve this package. But I am not seeing a performance problem that is addressed by this PR, and so I am hesitant to add code (especially with no documentation or tests) and accept its associated maintenance burden.
Thank you. I will try hard to follow the step of exactextractr
.
Could you give some clue about how to use exact_extract
correctly?
The following script in a relative low computing efficiency, which used 97s.
rs <- files %>% map(rast) %>% do.call(c, .)
r <- exact_extract(rs, shp, "weighted_mean", weights = rs * 0 + 1) # only area weighted
In comparison, st_extract
used 64s.
weights = rs * 0 + 1
probably takes a while to execute, because it has to read in the entire input. If you don't want a weighted mean, why not exact_extract(rs, shp, 'mean')
?
I need the area weighted mean. Is the following the correct?
rs <- files %>% map(rast) %>% do.call(c, .)
r <- exact_extract(rs, shp, "weighted_mean", weights = area(rs))
terra::area
doesn't do what I'd expect. Either way, exact_extract(rs, shp, 'weighted_mean', weights = 'area')
should be faster, or at least less memory=intensive.
Got it. Thank you!
This
CPP_exact_extract2
is previous script ofCPP_exact_extract
in v0.5.0, which is valuable for multiple files extraction. Hope this function can be kept in the master branch. Thank you!Those script has been tested in win10 and ubuntu 20.04. exactextractr works well on both of them.