isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
281 stars 26 forks source link

Progress bar is misleading when "frac" is used #88

Closed Nowosad closed 2 years ago

Nowosad commented 2 years ago

Hi @dbaston -- this is not a huge issue, but I just wanted to let you know about it (feel free to close it): I have raster data with 16 billion cells and 8 layers, and a set of 3 million polygons. The progress bar works well for some other methods I tested, e.g., it moves from 0 to 100% in about 2 hours and returns a result for fun = "median. For fun = "frac", however, it goes from 0 to 100% in about an hour, but then the calculations continue for ~another 7 hours.

I can prepare a code example if you want one.

dbaston commented 2 years ago

That's odd. Any chance you can share your data?

Nowosad commented 2 years ago

Yes, of course. The vector data window50 is at https://we.tl/t-4sST9i5bX0.

library(exactextractr)
library(terra)
library(sf)
curl::curl_download("https://storage.googleapis.com/feddata-r/nlcd/2001_Land_Cover_L48.tif",
                    destfile = "2001_Land_Cover_L48.tif")

lc_rast = rast("2001_Land_Cover_L48.tif")
window50 = read_sf("window50a.fgb")

lc_rast_extract1 = exact_extract(lc_rast, window50, "frac")
dbaston commented 2 years ago

Thanks, this is helpful.

The progress bar is lying because it assumes that once every polygon has been processed, no significant work remains. That turned out to be wrong here because the way I was identifying the set of unique raster values across all polygons was really slow. I improved that with b3c3dffe49eee1a6e8ac4670c0b3bbe8b6ddd07d, but if there is still a big lag it's likely possible to do better, or to allow the user to specify the set of unique values (if known) to skip this step.

Nowosad commented 2 years ago

Thanks a lot, @dbaston. It now works much faster (I would say 8 times faster, maybe).

dbaston commented 2 years ago

This has been released to CRAN in 0.9.1 along with an important bug fix for this same usage (#89)

Nowosad commented 2 years ago

Great -- thank you, Dan.