isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
281 stars 26 forks source link

Weighted_sum is giving weird results #52

Closed shuningge closed 3 years ago

shuningge commented 3 years ago

Hello!

I am using weighted_sum function in calculating the total population of each parish in Uganda (named parish) using population count raster file (named) from worldpop. I tried to both:

  1. exact_extract(population,parish, 'weighted_sum', weights = area(population))
  2. exact_extract(population,parish, 'sum')

They returned very different results: weighted_sum has max of ~500, sum has max of ~67,000. Summing across all parishes population, neither of the total population in the country calculated from method 1 and method 2 is consistent with the sum of the original raster file.

I also tried to test weighted_sum function using the brazil example, simple sum and weighted_sum are producing really different results. I am wondering what might go wrong.

I am using Mac Big Sur system, R and exactextractr package are both at current version. Your help is greatly appreciated!

dbaston commented 3 years ago

The results should be quite different: the weighted_sum operation is defined as population * coverage_fraction * area, whereas the sum operation is defined as population * coverage_fraction.

To get a total parish population you'd use want to use sum with a population count raster or weighted_sum with a population density raster.

shuningge commented 3 years ago

Dear Dan, thank you so much for the quick response! This is clear. Much appreciated!

However, I am still a little confused, this weighted_sum seems a little bit unusual to me because I was assuming that sum would be defined as sum(population of each pixel that are completely within the polygon) whereas weighted_sum is sum(population coverage_fraction). And by definition, population coverage_fraction should already account for pixels / grids that are partially within the polygon right? Weighted_sum and sum here (in the example of gridded population calculation) should have been identical right? considering population density * area should equal to population count...

Also, just want to make sure that I don't misunderstand and misuse mean and weighted_mean. For example, I am calculating weighted average poverty rate for each parish from a poverty rate raster map of 1km resolution. Mean would calculate the average across pixels that are completely within the polygon, weighted_mean is defined as *mean(poverty rate of a pixel coverage_fraction matrix)** so that pixels that are partially within the polygon is correctly included based on the overlapping area right?

Your guidance is greatly appreciated! Thank you for this amazing package.

dbaston commented 3 years ago

Have a look at the definitions of the summary functions here or as formulas here. sum, weighted_sum, mean, and weighted_mean all account for partially covered pixels.

If you want a calculation considers only pixels that are completely within the polygon, you'd need use an R function, like

exact_extract(population, parish, function(value, coverage_fraction) {
  sum(value[coverage_fraction ==1])
})