hunzikp / velox

https://hunzikp.github.io/velox/
119 stars 23 forks source link

Extract dramatically slow in version 0.2.0 #25

Open pascaloettli opened 6 years ago

pascaloettli commented 6 years ago

Hello,

While recently re-running a script utilizing the incredible power of velox, I noticed a dramatic slow down in the execution of this script. It appears that in-between the package velox has been update from version 0.1.0 to 0.2.0.

Basically, the script takes a raster of Digital Elevation Model (DEM) in GeoTIFF format and extract the mean elevation in 9 squared windows around stations. As a result, a matrix of Nx9 mean elevations.

Please find below an simple example summarizing the problem. I hope it might help you to solve this issue.

libs <- c("raster", "rgeos", "rgdal", "sp", "velox", "profvis")
lapply(libs, library, character.only = TRUE)

set.seed(42)

# Large projected raster, mimicking a 
dem <- raster(nrows = 12094, ncols = 12057, xmn = -331805.4, ymn = -47.6398, xmx = 778385.9, ymx = 1113551, crs = "+proj=omerc +lat_0=4 +lonc=102.25 +alpha=323.0257964666666 +k=0.99984 +x_0=804671 +y_0=0 +no_uoff +gamma=323.1301023611111 +ellps=GRS80 +units=m +no_defs", resolution = c(92.07857, 92.07857))
dem <- setValues(dem, 1:ncell(dem))

# One random spatial point, can be more
nb <- 1
p <- as(sampleRandom(dem, nb, xy = TRUE, sp = TRUE), "SpatialPoints")

dem.velox <- velox(dem)

# Side of the main window in meters
R <- 11000

# Half side of sub-window, i.e. R/(3*2)
r <- R/6

# Repeating the coordinates of the point
P <- matrix(rep(p@coords, each = 9), 9, 2)

# Creating the matrix to shift the original coordinates
# This will give the coordinates of the centroid of each polygon
S <- matrix(c(-2, 2, 0, 2, 2, 2, -2, 0, 0, 0, 2, 0, -2, -2, 0, -2, 2, -2), 9, 2, byrow = TRUE)

# Matrix of centroids (original point + 8 new centroids)
centroids <- P + S*r
centroids <- SpatialPoints(centroids)
projection(centroids) <- projection(p)

# Finally the 9 sub-windows from which the means will be extracted
buffers <- gBuffer(centroids, width = r, byid = TRUE, quadsegs = 1, capStyle = "SQUARE")

# Illustration
plot(buffers)
points(p, pch = "+", cex = 2)
points(centroids)

image

velox version 0.1.0

system.time(dem.velox$extract(buffers, mean))

user system elapsed 0.072 0.000 0.073

# Profiling the extraction
profvis(dem.velox$extract(buffers, mean))

The extraction of 9 polygons takes ~40ms to run.

velox version 0.2.0

system.time(dem.velox$extract(buffers, mean))

user system elapsed 3.140 15.828 18.972

# Profiling the extraction
profvis(dem.velox$extract(buffers, mean))

The extraction of 9 polygons takes ~16000ms to run, due to boost, boost.VeloxRaster, boostFactory$makePointGrid and .External taking long time to run.

nbarsch commented 6 years ago

I think this is the same issue I am having.

hunzikp commented 6 years ago

Thx for the feedback. I think I identified the issue; there's some sub-optimal logic involved if you're intersecting a very large raster with a small polygon collection. For the time being, check whether the legacy = TRUE option helps you out (which prompts velox to use 0.1.0 code). I'll have a go at the main issue in the mean time.

nbarsch commented 6 years ago

Thanks hunzikp. I have really liked the performance gains I've gotten from velox since using it before it was on cran, I'm just glad I figured out to get them back again! Cheers.

pascaloettli commented 6 years ago

@hunzikp Yes, using legacy = TRUE helps for now. Thank you for the suggestion.