hunzikp / velox

https://hunzikp.github.io/velox/
119 stars 23 forks source link

Enable 'na.rm = TRUE' during value extraction #3

Closed fdetsch closed 6 years ago

fdetsch commented 7 years ago

It would be nice to include a na.rm argument for the velox extraction method (see ?VeloxRaster_extract). Although the latter performs considerably faster as compared to raster::extract, I consider the lack of a proper handling of missing values a major drawback.

For example, the below code taken from gis-stackexchange works fine using a raster-based approach (i.e., extract) along with na.rm = TRUE:

library(raster)

## get country geometry
if (!dir.exists("data")) dir.create("data")
aut <- getData("GADM", country = "AUT", level = 2, path = "data")

## download worldclim data
path <- "data/worldclim"
if (!dir.exists(path)) dir.create(path)
bio <- getData("worldclim", var = "bio", res = 10, path = path)

## extract values
val <- extract(bio, aut, fun = mean, na.rm = TRUE)
val[1:5, 1:5]

[1,] 99.0 97.00 31.00 7392.00 262.00 [2,] 94.5 96.50 32.00 7305.25 256.00 [3,] 93.0 97.50 31.50 7362.50 253.00 [4,] 90.0 98.00 31.00 7390.00 251.50 [5,] 91.0 96.00 32.00 7278.00 252.00

Unfortunately, NAs are not omitted using velox (regardless of the fact that it performs amazingly fast):

library(velox)
vlx <- velox(bio)
xtr <- vlx$extract(aut, fun = function(x) mean(x, na.rm = TRUE))
xtr[1:5, 1:5]

[1,] 99 97.0 31.0 7392.0 262.0 [2,] NA NA NA NA NA [3,] 93 97.5 31.5 7362.5 253.0 [4,] 90 98.0 31.0 7390.0 251.5 [5,] 91 96.0 32.0 7278.0 252.0

Here is my sessionInfo:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] velox_0.1.0.9000 rgeos_0.3-22     raster_2.5-8     sp_1.2-4         devtools_1.12.0 

loaded via a namespace (and not attached):
 [1] httr_1.2.1      R6_2.2.0        rgdal_1.2-5     tools_3.3.2     withr_1.0.2     curl_2.3        Rcpp_0.12.9    
 [8] memoise_1.0.0   grid_3.3.2      git2r_0.18.0    digest_0.6.11   lattice_0.20-34
MatthieuStigler commented 7 years ago

I don't think the issue is not about NA values: just put NAs in example of velox$extract(), will work!). I think it is because your polygon is actually too small.

So as a workaround, you could just resample the data. Make first your example much smaller:

bio_easy <- crop(bio$bio1, extent(aut[1:3,]))
bio_easy2 <- mask(bio_easy, aut)

plot(bio_easy2)
plot(aut[1:3,], add=TRUE)

rplot

Resample and check:

bio_easy3 <- bio_easy2
nrow(bio_easy3) <- nrow(bio_easy2)*10

bio_easy3b <- raster::resample(bio_easy2, bio_easy3, method="ngb")

xtr <- velox(bio_easy3b)$extract(aut, fun = function(x) mean(x, na.rm = TRUE))
xtr[1:5,1]

What happens is that raster::extract() returns values even for small with very small coverage (check with: raster::extract(bio_easy2, aut, df=TRUE, weights=TRUE, normalizeWeights=FALSE)). On the other side, it seems like velox$extract() does not return these same cells. But this appears still to be an issue in itself?

hunzikp commented 6 years ago

As @MatthieuStigler has pointed out correctly, the initial issue was that the polygons didn't intersect with any cell centroid, thus the NA values. As of version 0.1.0.9007 VeloxRaster_extract features a small argument that avoids this type of problem. See ?VeloxRaster_extract for more information.

fdetsch commented 6 years ago

Great, thanks – exactly what I was looking for!

hsrobinson commented 6 years ago

I have a 30mx30m pixel resolution raster I would like to sample with a 1kmX1km cell sized polygon grid (fun=mean). It appears that in any cell where a single pixel is a missing value, Velox returns an NA for the entire cell. The number of NAs remains constant for the extract whether small is TRUE or FALSE. Am I understanding correctly that there is no way to have Velox ignore the missing pixels and return the mean value of the pixels that do reside within each grid cell?

abfleishman commented 6 years ago

@hsrobinson try using fun = function(x) mean(x, na.rm = TRUE)) instead of fun= mean

hsrobinson commented 6 years ago

That worked perfectly. Thank-you.