isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
281 stars 26 forks source link

Implement parallelised execution #29

Closed ranghetti closed 4 years ago

ranghetti commented 4 years ago

Hi, thank you for your package, very useful to speed-up raster aggregation. I made some changes in order to further speed-up the extraction exploiting parallel computation. I only made use of base package parallel (included as suggested dependency) so not to make dependencies heavier. To use this modality, argument exact_extract(..., parallel = TRUE) must be explicitly passed (default is FALSE). I tested it on Windows (R 4.0.0) and Ubuntu (R 3.6.3). Hoping you will find this PR useful.

Bye, Luigi Ranghetti

dbaston commented 4 years ago

Thanks for this! I'm trying to test locally and am getting the following error:

r <- raster('/home/dan/data/gpw_v4_population_count_rev11_2015_30_sec.tif')
p <- st_read('/home/dan/data/tl_2019_us_state.shp')

system.time(x1 <- exact_extract(r, p, 'sum', parallel=FALSE)) # ok
system.time(x2 <- exact_extract(r, p, 'sum', parallel=TRUE)) 

# > x2[1]
# [1] "Error in CPP_stats(x, weights, wkb, fun, max_cells_in_memory) : \n  Evaluation error: Failure during raster IO\n.\n"

Any thoughts?

ranghetti commented 4 years ago

It is running on Linux (through parallel::mclapply(); I am using this reproducible example:

> brazil <- st_as_sf(getData('GADM', country='BRA', level=2))
provo con l'URL 'https://biogeo.ucdavis.edu/data/gadm3.6/Rsp/gadm36_BRA_2_sp.rds'
Content type 'text/html; charset=iso-8859-1' length 8687212 bytes (8.3 MB)
==================================================
downloaded 8.3 MB

> prec <- getData('worldclim', var='prec', res=10)[[12]]
provo con l'URL 'https://biogeo.ucdavis.edu/data/climate/worldclim/1_4/grid/cur/prec_10m_bil.zip'
Content type 'application/zip' length 5118786 bytes (4.9 MB)
==================================================
downloaded 4.9 MB

> system.time(x1 <- exact_extract(prec, brazil, 'sum', parallel=FALSE)) # ok
  |==========================================================================================================================================================| 100%
   user  system elapsed 
  7.899   0.041   7.884 
> system.time(x2 <- exact_extract(prec, brazil, 'sum', parallel=TRUE)) 
   user  system elapsed 
 16.016   6.482   1.816 

> head(x1)
  sum.prec1 sum.prec2 sum.prec3 sum.prec4 sum.prec5 sum.prec6 sum.prec7 sum.prec8 sum.prec9 sum.prec10 sum.prec11 sum.prec12
1  1277.016  1281.993  1169.365  934.7251  500.5550  179.5607   95.9675  161.3256  441.1229   679.4379   1040.139   1065.075
2  2093.158  2136.923  1951.037 1653.9073  754.9750  251.3503  246.5819  420.4114  759.1218  1381.7677   2011.484   2115.916
3  3075.961  3042.831  2786.772 2375.2007  991.3964  264.9034  230.0675  489.6361 1102.3470  1849.4146   2801.739   2948.129
4  2921.722  2902.637  2500.209 1963.0975 1141.3306  519.9777  351.8989  412.0550 1001.0112  1618.1831   2116.434   2657.462
5  1442.984  1397.344  1177.999  899.9659  490.6928  216.4971  189.7692  209.0536  498.1832   845.2963   1060.703   1319.866
6  5597.812  5827.785  6691.600 5221.7266 3225.3943 1812.6852 1093.6027 1578.7748 2690.3335  4121.0293   5114.721   5815.449
> head(x2)
  sum.prec1 sum.prec2 sum.prec3 sum.prec4 sum.prec5 sum.prec6 sum.prec7 sum.prec8 sum.prec9 sum.prec10 sum.prec11 sum.prec12
1  1277.016  1281.993  1169.365  934.7251  500.5550  179.5607   95.9675  161.3256  441.1229   679.4379   1040.139   1065.075
2  2093.158  2136.923  1951.037 1653.9073  754.9750  251.3503  246.5819  420.4114  759.1218  1381.7677   2011.484   2115.916
3  3075.961  3042.831  2786.772 2375.2007  991.3964  264.9034  230.0675  489.6361 1102.3470  1849.4146   2801.739   2948.129
4  2921.722  2902.637  2500.209 1963.0975 1141.3306  519.9777  351.8989  412.0550 1001.0112  1618.1831   2116.434   2657.462
5  1442.984  1397.344  1177.999  899.9659  490.6928  216.4971  189.7692  209.0536  498.1832   845.2963   1060.703   1319.866
6  5597.812  5827.785  6691.600 5221.7266 3225.3943 1812.6852 1093.6027 1578.7748 2690.3335  4121.0293   5114.721   5815.449

> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  it_IT.UTF-8                 
 ctype    it_IT.UTF-8                 
 tz       Europe/Rome                 
 date     2020-05-12                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version date       lib source        
 assertthat      0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
 class           7.3-16  2020-03-25 [1] CRAN (R 3.6.3)
 classInt        0.4-3   2020-04-07 [1] CRAN (R 3.6.3)
 cli             2.0.2   2020-02-28 [1] CRAN (R 3.6.3)
 codetools       0.2-16  2018-12-24 [1] CRAN (R 3.6.1)
 crayon          1.3.4   2017-09-16 [2] CRAN (R 3.5.1)
 DBI             1.1.0   2019-12-15 [1] CRAN (R 3.6.2)
 e1071           1.7-3   2019-11-26 [1] CRAN (R 3.6.1)
 exactextractr * 0.2.2   2020-05-12 [1] local         
 fansi           0.4.1   2020-01-08 [1] CRAN (R 3.6.2)
 glue            1.4.0   2020-04-03 [1] CRAN (R 3.6.3)
 KernSmooth      2.23-16 2019-10-15 [1] CRAN (R 3.6.1)
 lattice         0.20-41 2020-04-02 [1] CRAN (R 3.6.3)
 magrittr        1.5     2014-11-22 [1] CRAN (R 3.6.1)
 packrat         0.5.0   2018-11-14 [1] CRAN (R 3.6.1)
 raster        * 3.1-5   2020-04-19 [1] CRAN (R 3.6.3)
 Rcpp            1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)
 rgdal           1.4-8   2019-11-27 [1] CRAN (R 3.6.3)
 rgeos           0.5-2   2019-10-03 [1] CRAN (R 3.6.2)
 rstudioapi      0.11    2020-02-07 [1] CRAN (R 3.6.2)
 sessioninfo     1.1.1   2018-11-05 [2] CRAN (R 3.5.1)
 sf            * 0.9-2   2020-04-14 [1] CRAN (R 3.6.3)
 sp            * 1.4-1   2020-02-28 [1] CRAN (R 3.6.3)
 units           0.6-6   2020-03-16 [1] CRAN (R 3.6.3)
 withr           2.2.0   2020-04-20 [1] CRAN (R 3.6.3)

[1] /home/lranghetti/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

On Windows I am encountering problems on the previous example:

 Evaluation error: Null external pointer

This is probably due to the impossibility to export an external pointer to clusters (C methods used in CPP_stats and CPP_exact_extract, I suppose). I am not an expert of that, so I do not know how bypassing it (I do not know why the example used yesterday was working). If you think the parallelisation under Unix systems could be useful, I can edit the pull request to switch to single core mode on Windows.

dbaston commented 4 years ago

Your example works for me. But if I write prec to disk and read it back (instead of bringing it in with getData, then I get the same error.

brazil <- st_as_sf(getData('GADM', country='BRA', level=2))
prec <- getData('worldclim', var='prec', res=10)[[12]]

writeRaster(prec, '/tmp/prec.tif')
prec <- raster('/tmp/prec.tif')

system.time(x1 <- exact_extract(prec, brazil, 'sum', parallel=FALSE))
system.time(x2 <- exact_extract(prec, brazil, 'sum', parallel=TRUE))

Here's mine

> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.0 (2020-04-24)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language en_US                       
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2020-05-12                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────
 !  package       * version date       lib source        
    assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
    backports       1.1.6   2020-04-05 [1] CRAN (R 4.0.0)
    callr           3.4.3   2020-03-28 [1] CRAN (R 4.0.0)
    class           7.3-17  2020-04-26 [1] CRAN (R 4.0.0)
    classInt        0.4-3   2020-04-07 [1] CRAN (R 4.0.0)
    cli             2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
    codetools       0.2-16  2018-12-24 [1] CRAN (R 4.0.0)
    crayon          1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
    DBI             1.1.0   2019-12-15 [1] CRAN (R 4.0.0)
    desc            1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
    devtools        2.3.0   2020-04-10 [1] CRAN (R 4.0.0)
    digest          0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
    e1071           1.7-3   2019-11-26 [1] CRAN (R 4.0.0)
    ellipsis        0.3.0   2019-09-20 [1] CRAN (R 4.0.0)
 VP exactextractr * 0.2.2   2020-05-07 [?] local         
    fansi           0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
    fs              1.4.1   2020-04-04 [1] CRAN (R 4.0.0)
    glue            1.4.0   2020-04-03 [1] CRAN (R 4.0.0)
    KernSmooth      2.23-17 2020-04-26 [1] CRAN (R 4.0.0)
    lattice         0.20-41 2020-04-02 [1] CRAN (R 4.0.0)
    magrittr        1.5     2014-11-22 [1] CRAN (R 4.0.0)
    memoise         1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
    pkgbuild        1.0.7   2020-04-25 [1] CRAN (R 4.0.0)
    pkgload         1.0.2   2018-10-29 [1] CRAN (R 4.0.0)
    prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
    processx        3.4.2   2020-02-09 [1] CRAN (R 4.0.0)
    ps              1.3.2   2020-02-13 [1] CRAN (R 4.0.0)
    R6              2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
    raster        * 3.1-5   2020-04-19 [1] CRAN (R 4.0.0)
    Rcpp            1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
    remotes         2.1.1   2020-02-15 [1] CRAN (R 4.0.0)
    rgdal           1.4-8   2019-11-27 [1] CRAN (R 4.0.0)
    rgeos           0.5-3   2020-05-08 [1] CRAN (R 4.0.0)
    rlang           0.4.6   2020-05-02 [1] CRAN (R 4.0.0)
    rprojroot       1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
    rstudioapi      0.11    2020-02-07 [1] CRAN (R 4.0.0)
    sessioninfo     1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
    sf            * 0.9-3   2020-05-04 [1] CRAN (R 4.0.0)
    sp            * 1.4-1   2020-02-28 [1] CRAN (R 4.0.0)
    testthat      * 2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
    units           0.6-6   2020-03-16 [1] CRAN (R 4.0.0)
    usethis         1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
    withr           2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
    yaml            2.2.1   2020-02-01 [1] CRAN (R 4.0.0)

[1] /home/dan/R/x86_64-pc-linux-gnu-library/4.0
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

 V ── Loaded and on-disk version mismatch.
 P ── Loaded and on-disk path mismatch.
ranghetti commented 4 years ago

This is another situation which I did not encounter (I used in-memory rasters for tests). I need time to investigate if a solution can be found; in case I will find it, I will reopen the pull request. Sorry I wasted your time.