isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
274 stars 26 forks source link

Stand alone function to generate weight table? #8

Closed dblodgett-usgs closed 5 years ago

dblodgett-usgs commented 5 years ago

Greetings!

I've been working on a project over here: https://github.com/USGS-R/intersectr that implements a similar workflow to exactextractr but, so far, focuses on low-enough resolution grids that it is not unreasonable to represent them as cell geometry. I currently have handling for curvilinear and Rectilinear NetCDF Grids. I hope to expand support to include high-res rectilinear Raster grids a la Zonal Stats as well as ad-hoc polygon coverages.

For performance and flexibility, the workflow in intersectr uses cacheable files that contain "cell geometry" -> "area weights" -> "extracted data" -- I mention this to illustrate that I generate a table containing the area weights for each combination of input data source cell and destination geometry. The area weights uses an ID assigned by row so 1 is the upper left and nrow*ncol is lower right.

Would you be interested in exporting a function that could generate such area weights? I'm hoping to get intersectr to CRAN too... maybe we could team up? Really like the exactextractr method!!

dbaston commented 5 years ago

Thanks for reaching out, @dblodgett-usgs. I saw your interesectr package come though my feed a couple of weeks ago and have been meaning to check it out in some more detail.

I just added a couple of examples to the README that show additional uses of this package; one is the use of exact_extract without a function parameter, which returns a table of cell values and coverage fractions. Maybe a tweaked version of this function that returns a row/column number would provide you with what you need? Another function is partial_mask which returns the coverage fractions as a RasterLayer, though this is prone to quickly exhausting available memory.

I had initially thought of persisting these area weights between usages, but for my usage the cost of computing the weights isn't much different from LZW-uncompressing a stored version of the same. (I see how support for curvilinear grids or arbitrary polygonal coverages etc. would absolutely change the equation.) More of my attention as of late has been on the C++ command-line version of exactextract, where I take the approach of specifying in advance all of the rasters and features to be processed, computing the intersection values a single time, and then calculating the statistics for each feature/raster combination. But I'd be happy to make some tweaks on the R package to make it easier to interact with.

dblodgett-usgs commented 5 years ago

Hmmm... This is interesting...

but for my usage the cost of computing the weights isn't much different from LZW-uncompressing a stored version of the same.

We have an old system implemented in Java with the JTS that does un-weighted statistics for very dense grids and the same thing was true. The point in polygon was so fast that it just gets recalculated rather than storing it in memory (which blows up real fast as you say).

I'll have to think about how I want to do this. Need to get a use case and a dataset to test out. I'm planning on morphing the intersectr package toward stars as that package matures too so lots of open questions still.

Thanks for the response and samples. I'll keep an eye on this as things progress.