hunzikp / velox

https://hunzikp.github.io/velox/
119 stars 23 forks source link

extract: most efficient way to get weights? #16

Open MatthieuStigler opened 6 years ago

MatthieuStigler commented 6 years ago

Hi

I am interested in obtaining the weights of each cell with the function extract, when it is used with fun=NULL (although it could also be useful for fun=fo, weighting results).

I can see two ways:

  1. raster::extract does it in pseudo-code (see true code in rasterizePolygons.R):

    r2 <- disaggregate(raster(x), 10) r3 <- "rasterize"(r2) aggregate(r3, 10, sum)

This could be used, although so far the aggregate does not seem to handle NAs (see Issue 9)?

  1. Another approach is to do:

    r2 <- raster::disaggregate(raster(x), 10) r3 <- velox::extract(r2, df=TRUE) aggregate the data-frame on cell

I use this approach, and I get identical results to the extract ones. Before going into details, is this necessarily the most efficient? Can you think of a faster way?

Code (based on pull request 14 allowing df=TRUE). This for now uses raster package (for disaggregate) and tidyverse.

library(testthat)
library(tidyverse)
library(raster)
library(velox)

#####################
### TEST
#####################

extr_weights <- function(x, cellnumbers=FALSE, sp, normalizeWeights=TRUE){
  x$cell <- 1:(nrow(x)*ncol(x))
  brk_100 <- disaggregate(x, fact=10)
  brk_100_vx <- velox(brk_100)
  vx.raw.df <- brk_100_vx$extract(sp, fun = NULL, df=TRUE) %>% as.tbl
  colnames(vx.raw.df)[-1] <- names(x)

  res <- vx.raw.df %>%
    count_(colnames(.)) %>%
    arrange(ID_sp, cell) %>%
    mutate(n=n/100) %>%
    rename(weight=n) %>%
    dplyr::select(ID_sp, cell, everything())

  if(normalizeWeights) {
    res <- res %>%
      group_by(ID_sp) %>%
      mutate(weight=weight/sum(weight)) %>%
      ungroup()
  }

  if(!cellnumbers) res <- dplyr::select(res, -cell)
  res

}

#####################
### TEST
#####################

set.seed(0)
mat1 <- matrix(rnorm(100), 10, 10)
mat2 <- matrix(rnorm(100), 10, 10)
brk <- brick(raster(mat1), raster(mat2))
vx <- velox(list(mat1, mat2), extent=c(0,1,0,1), res=c(0.1,0.1),
            crs="+proj=longlat +datum=WGS84 +no_defs")

## Make SpatialPolygons
coord <- matrix(c(0,0,1,1, 0.5, 0.55), nrow=3, 2, byrow = TRUE)
spoint <- SpatialPoints(coords=coord)
spols <- gBuffer(spgeom=spoint, width=c(0.5, 0.5, 0.04), byid = TRUE)

## Extract raw values as data-frame

rs.raw.df_w_norm <- raster::extract(x = brk, y = spols, fun = NULL, df=TRUE, small=FALSE,
                                    cellnumbers=TRUE, weights=TRUE, normalizeWeights=TRUE) %>% as.tbl

rs.raw.df_w_unnorm <- raster::extract(x = brk, y = spols, fun = NULL, df=TRUE, small=FALSE,
                                    cellnumbers=TRUE, weights=TRUE, normalizeWeights=FALSE) %>% as.tbl

vx.raw.df_w_unnorm <- extr_weights(x=brk, cellnumbers=TRUE, sp=spols, normalizeWeights=FALSE) %>%
  rename(ID=ID_sp) %>%
  mutate(ID=as.double(ID))

vx.raw.df_w_norm <- extr_weights(x=brk, cellnumbers=TRUE, sp=spols, normalizeWeights=TRUE) %>%
  rename(ID=ID_sp) %>%
  mutate(ID=as.double(ID))

all.equal(vx.raw.df_w_norm, rs.raw.df_w_norm)
all.equal(vx.raw.df_w_unnorm, rs.raw.df_w_unnorm)

expect_true(all.equal(vx.raw.df_w, rs.raw.df))
hunzikp commented 6 years ago

As for now the dissagregate->extract->aggregate path is probably the best option. A clean implementation in velox would probably require the implementation of a C++ disaggregation method. This isn't at the top of my priority list right now, but I'll keep the issue open until it is.