isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
274 stars 26 forks source link

applying functions before summary #37

Closed chljl closed 3 years ago

chljl commented 3 years ago

Can I apply a math function (e.g. log or sin) to raster values after the extraction with polygons and before a summary operation (e.g. sum, mean) is done? The reason I ask for that is sometime I have very large original raster files (GB to 10s of GB) and I want to extract values with a number of small polygons. If I apply these preprocessing functions directly to the original rasters, it will take forever. But after extraction to the individual polygon extent, it should be much easier. If that functionality does NOT exist, could you add it in?

dbaston commented 3 years ago

Can you tell me a bit more about your specific use case? For example, I've thought about adding an operation for the mean of noise levels in dB. On the other hand, a truly generic preprocessing capability strikes me as useful but more in the domain of the raster package.

chljl commented 3 years ago

Thanks for getting back to me so quick! One case of mine is that I have a very large raster of topographic slope in degree for each pixel. I also have a shapefile of polygons for the boundaries of many watersheds. I want to calculate the average slope for each watershed, but want to first convert the slope in degree to slope gradient using the tangent function (tan(x*pi/180)). It would be more convenient to do this calculation after extracting the raster than using the original raster.

dbaston commented 3 years ago

Is exact_extract(rast, poly, function(x,c) weighted.mean( tan(x*pi/180), c )) an option, or does that use too much memory?

A custom R function will load all cells intersecting a polygon into memory at once. If you have a huge raster AND huge polygons, the summary operations can avoid loading all cells intersecting a polygon into memory. But unless you want the average slope of Russia, you should be OK with the R function.

chljl commented 3 years ago

That is great! I never thought I can put the math functions in like this. I tried to directly feed my raster and polygon data into this expression and looks like it did the job pretty quickly. Amazing! But I don't quite understand the grammar here. I guess x is rast and c is the coverage fraction. Shouldn't I put rast in the weighted.mean() part to replace x?

dbaston commented 3 years ago

The function will get called for each polygon, so x represents the values of rast within a given polygon, and c represents the coverage fractions associated with those values. This is similar to raster::extract, except that raster::extract does not deal with coverage fractions.