hunzikp / velox

https://hunzikp.github.io/velox/
119 stars 23 forks source link

Performance Velox vs Raster : load raster #15

Closed naub1n closed 6 years ago

naub1n commented 6 years ago

hi,

Thanks for this package. It's very powerful for extract data. However, loading raster is very slow compare to raster package. So, the full process with velox package is not faster than with raster package.

This is an exemple:

### Install velox with devtools (v0.1.0.9004 with @extract_points())
install.packages("devtools")
library(devtools)
install_github("hunzikp/velox")
### Load library
library(raster)
library(velox)
library(sp)
library(rbenchmark)
### Download a Raster
tf <- tempfile()
td <- tempdir()
download.file(url = 'https://raw.githubusercontent.com/GeoScripting-WUR/IntroToRaster/gh-pages/data/gewata.zip', destfile = tf, method = 'auto')
unzip(tf,exdir =  td)
lf<-list.files(path = td ,pattern = '^.*\\.tif$', full.names = T)
### Load Raster
benchmark(
  vx<-velox(lf),
  r<-raster(lf),
  replications = 10
)
#RESULT load raster:
#              test replications elapsed relative user.self sys.self user.child sys.child
# 2 r <- raster(lf)           10    0.11    1.000      0.09     0.00         NA        NA
# 1 vx <- velox(lf)           10    0.94    8.545      0.71     0.15         NA        NA
### Create Point
x_coord<-runif(10,r@extent@xmin,r@extent@xmax)
y_coord<-runif(10,r@extent@ymin,r@extent@ymax)
p<-SpatialPoints(data.frame(x_coord,y_coord))
### Extract value
benchmark(
  v_vx<-vx$extract_points(p),
  v_r<-raster::extract(r,p),
  replications = 10
)
#RESULT extract data:
#                           test replications elapsed relative user.self sys.self user.child sys.child
# 2 v_r <- raster::extract(r, p)           10     6.1      6.1      6.06     0.01         NA        NA
# 1 v_vx <- vx$extract_points(p)           10     1.0      1.0      0.50     0.50         NA        NA

velox 9x slower to load raster than raster velox 6x faster to extract data

I have 4000 big ASC rasters and i want to extract data in each one. Can you load raster faster or should i change my code?

(sorry for my bad english)

hunzikp commented 6 years ago

Hi Gohan,

In your comparison, velox loads data more slowly than raster because by default, the raster function does not actually load the data. It only links to the file on disk. raster even tells us that this is the case if we print the 'loaded' object:

### Init
library(raster)
library(velox)
library(sp)
library(rbenchmark)

### Download a Raster
tf <- tempfile()
td <- tempdir()
download.file(url = 'https://raw.githubusercontent.com/GeoScripting-WUR/IntroToRaster/gh-pages/data/gewata.zip', destfile = tf, method = 'auto')
unzip(tf, exdir =  td)
lf <- list.files(path = td, pattern = '^.*\\.tif$', full.names = T)

### 'Load' raster
raster(lf)
print(r)
# ...
# data source : /tmp/RtmpMV4msb/LE71700552001036SGS00_SR_Gewata_INT1U.tif 
# ...

If we force raster to read the file into memory, the 'data source' line reads 'in memory':

print(readAll(r))
# ...
# data source : in memory
# ...

If we compare actual loading times, we can see that velox slightly outperforms raster:

### Compare actual loading times
benchmark(
  velox = {vx <- velox(lf)},
  raster = {r <- readAll(raster(lf))},
  replications = 10
)
#    test replications elapsed relative user.self sys.self user.child sys.child
#2 raster           10   1.554    1.967     1.536    0.008          0         0
#1  velox           10   0.790    1.000     0.780    0.008          0         0

But what you really seem to be interested in is the full operation of loading and extracting the data. In your example, velox is actually still faster than raster on the combined operation: The absolute time elapsed for velox is 1.94s and for raster it is 6.21s (you seem to be comparing relative speeds, which doesn't make much sense).

However, a better way to evaluate whether velox is faster than raster on the combined operations is to benchmark them directly:

### Create Point
x_coord <- runif(10,r@extent@xmin, r@extent@xmax)
y_coord <- runif(10,r@extent@ymin, r@extent@ymax)
p <- SpatialPoints(data.frame(x_coord, y_coord))

### Benchmark load & extract jointly
benchmark(
  velox = {vx <- velox(lf); v_vx <- vx$extract_points(p)},
  raster = {r <- raster(lf); v_r <- raster::extract(r,p)},
  replications = 10
)
#    test replications elapsed relative user.self sys.self user.child sys.child
#2 raster           10   1.341    1.439     1.340    0.000          0         0
#1  velox           10   0.932    1.000     0.924    0.008          0         0
naub1n commented 6 years ago

Ok, thanks, it's very interesting. but i have try this on a bigger ASC raster:

### Init
library(raster)
library(velox)
library(sp)
library(rbenchmark)

### Download a Raster
lf <- tempfile()
td <- tempdir()
download.file(url = 'https://www.dropbox.com/s/guooescimjjk5ok/BDALTIV2_75M_FXX_0525_6600_MNT_LAMB93_IGN69.asc?dl=1', destfile = lf, method = 'auto')
### Create Point
r<-raster(lf)
x_coord <- runif(10,r@extent@xmin, r@extent@xmax)
y_coord <- runif(10,r@extent@ymin, r@extent@ymax)
p <- SpatialPoints(data.frame(x_coord, y_coord))

### Benchmark load & extract jointly
benchmark(
  velox = {vx <- velox(lf); v_vx <- vx$extract_points(p)},
  raster = {r <- raster(lf); v_r <- raster::extract(r,p)},
  replications = 50
)

#     test replications elapsed relative user.self sys.self user.child sys.child
# 2 raster           50   60.72    1.165     54.77     5.60         NA        NA
# 1  velox           50   52.12    1.000     43.85     7.39         NA        NA

So, Yes, velox is faster than raster but not MUCH faster in this exemple I don't want to say velox it's bad package, (no no no !!!) i just want to know the best way to use it. I use parallele package in my code and it reduce significatively the process time.

Thanks lot for your help and your work !!!