isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
274 stars 26 forks source link

Predefined 'sum' function returning 0 instead of NA with raster extended internally by the package #18

Closed NB-Bio closed 4 years ago

NB-Bio commented 4 years ago

Hi again, I was testing your fantastic package with my data and I found something weird using it with polygons whose extent is greater than the raster one.

Here the data for the reproducible example

Prologue I tried this: species$sum <- exact_extract(r.example, species, 'sum') where r.example is the raster and species is the shapefile, and I got 0 as results both for Species 1 and Species 3.

Basically, "Species 1" polygon lies outside the raster range. I suppose that in cases like this, the package extends the raster before any other summary operation. On the other hand, "Species 3" lies within the raster range but only overlaps raster cells with NA.

Since I expected to get NA for Species 1 and 3 (and not 0), I started to investigate and I found this . That is, when R sum function is applied to only NA values, with na.rm = T, it returns 0 (and not NA as one could expect). In R language: sum(NA, NA, na.rm = T) returns 0

To solve the issue, the solution proposed is to use na.rm = any(!is.na(x))).

Here starts the issue with the exactextract package.

In order to apply the aforementioned solution with the sum operation, I used the package with the following customised function:

species$sum <- exact_extract(r.example, species, function(values, coverage_frac) sum(values, na.rm = any(!is.na(values))))

The command worked properly returning NA for Species 3 , but not for Species 1 (it returned 0). The only reason I could imagine was some kind of problem with the raster expansion internal to the package functioning (since Species 1 lies outside the raster). To test my hypothesis I previously expanded the raster, introducing NA for the new cells:

e <- extent(-180, 180, -90, 90)
r.ext <- extend(r.example, e, value = NA)

Then I re-used the customised function with the new manually-extended raster:

species$sum <- exact_extract(r.ext , species, function(values, coverage_frac) sum(values, na.rm = any(!is.na(values))))

This worked properly, returning NA both for Species 1 and 3. Maybe, when extending the raster 0 values are introduced instead of NAs? This could explain such behaviour.

I didn't test the issue with the other functions.

Thanks for your work!

dbaston commented 4 years ago

You can get some insight into what's happening by running exact_extract without a summary operation or function:

exact_extract(r.example, species)

You'll see that for species #1, a zero-row data frame is produced. This is because no raster cells intersect this polygon. For species #3, a three-row data frame is produced, where each value is NA. (The polygon intersects three cells, all of which have a value of NA.). These seem like expected results to me.

A summary function that might accomplish what you're looking for is

function(x, c) ifelse(any(!is.na(x)), sum(x, na.rm=TRUE), NA)

NB-Bio commented 4 years ago

Thank you for your answer, I thought the package extended the rasters in case like this, now it's clear.