hunzikp / velox

https://hunzikp.github.io/velox/
119 stars 23 forks source link

velox slower than raster for extracting points from .adf rasters? #22

Closed pbaylis closed 6 years ago

pbaylis commented 6 years ago

Hello,

I was trying to use velox to extract points from a wildfire raster and noticed that velox was much slower than raster. I've reproduced this issue on both Linux Mint and OS X. Apologies for the long MWE:

library(raster)
library(velox)
library(sf)

# https://www.fs.usda.gov/rds/archive/Product/RDS-2015-0046
rast <- raster("w001001.adf")
vx <- velox("w001001.adf")

n <- 100
points.df <- data.frame(x=runif(n, extent(rast)@xmin, extent(rast)@xmax),
                        y=runif(n, extent(rast)@ymin, extent(rast)@ymax))
points.sf <- st_as_sf(points.df, coords = c("x", "y"), crs = proj4string(rast), agr = "constant")
points.sf$test <- 1
points.sp <- as(points.sf, "Spatial")

system.time(extract(rast, points.sp)) # 0.3s
system.time(vx$extract_points(points.sf)) # 145s
system.time(vx$extract_points(points.sp)) # 154s

# Velox is much faster if raster is resaved as .tif
writeRaster(rast, "w001001_fromRaster.tif")
vx3 <- velox("w001001_fromRaster.tif")
rast3 <- raster("w001001_fromRaster.tif")
system.time(vx3$extract_points(points.sf)) # ~0s
system.time(vx3$extract_points(points.sp)) # ~0s
system.time(extract(rast3, points.sp)) # 0.3s

As you can see at the bottom, my resolution at the moment is to resave as GeoTiff and reload, at which point velox is once again much faster on OSX and at least comparable to raster on Linux Mint. But I admit I'm a bit confused. Both systems have plenty of memory.

hunzikp commented 6 years ago

Hi there,

This is a tricky one, thanks for posting! The problem seems to arise if the raster data are stored as integers, rather than doubles. If you simply change the storage mode to double precision before extracting, the issue goes away:

library(raster)
library(velox)
library(sf)

## Get data
vx <- velox("w001001.adf")
rast <- raster("w001001.adf")

## Generate points
n <- 100
set.seed(0)
points.df <- data.frame(x=runif(n, extent(rast)@xmin, extent(rast)@xmax),
                        y=runif(n, extent(rast)@ymin, extent(rast)@ymax))
points.sf <- st_as_sf(points.df, coords = c("x", "y"), crs = proj4string(rast), agr = "constant")
points.sf$test <- 1

## Extract points
system.time(res1 <- vx$extract_points(points.sf))  # painfully slow

## Store raster data as double precision first
vx2 <- vx$copy()
storage.mode(vx2$rasterbands[[1]]) <- "double"
system.time(res2 <- vx2$extract_points(points.sf))  # fast

I'm not exactly sure why storing the data as integers causes such a slow-down; the way velox handles data types at the moment, the integer matrix should be converted to a double precision matrix by Rcpp once it's passed to C++, but evidently this isn't happening. I'll try to put together a fix for this particular issue later today and push it to GitHub. On a more general note, the way velox currently handles data types is unsatisfactory (integers are cast as doubles), and should probably be addressed more fundamentally in the long term.