ranghetti commented 4 years ago

Hi, thank you for your package, very useful to speed-up raster aggregation. I made some changes in order to further speed-up the extraction exploiting parallel computation. I only made use of base package parallel (included as suggested dependency) so not to make dependencies heavier. To use this modality, argument exact_extract(..., parallel = TRUE) must be explicitly passed (default is FALSE). I tested it on Windows (R 4.0.0) and Ubuntu (R 3.6.3). Hoping you will find this PR useful.

Bye, Luigi Ranghetti

dbaston commented 4 years ago

Thanks for this! I'm trying to test locally and am getting the following error:

r <- raster('/home/dan/data/gpw_v4_population_count_rev11_2015_30_sec.tif')
p <- st_read('/home/dan/data/tl_2019_us_state.shp')

system.time(x1 <- exact_extract(r, p, 'sum', parallel=FALSE)) # ok
system.time(x2 <- exact_extract(r, p, 'sum', parallel=TRUE)) 

# > x2[1]
# [1] "Error in CPP_stats(x, weights, wkb, fun, max_cells_in_memory) : \n  Evaluation error: Failure during raster IO\n.\n"

Any thoughts?

ranghetti commented 4 years ago

It is running on Linux (through parallel::mclapply(); I am using this reproducible example:

> brazil <- st_as_sf(getData('GADM', country='BRA', level=2))
provo con l'URL ''
Content type 'text/html; charset=iso-8859-1' length 8687212 bytes (8.3 MB)
downloaded 8.3 MB

> prec <- getData('worldclim', var='prec', res=10)[[12]]
provo con l'URL ''
Content type 'application/zip' length 5118786 bytes (4.9 MB)
downloaded 4.9 MB

> system.time(x1 <- exact_extract(prec, brazil, 'sum', parallel=FALSE)) # ok
  |==========================================================================================================================================================| 100%
   user  system elapsed 
  7.899   0.041   7.884 
> system.time(x2 <- exact_extract(prec, brazil, 'sum', parallel=TRUE)) 
   user  system elapsed 
 16.016   6.482   1.816 

> head(x1)
  sum.prec1 sum.prec2 sum.prec3 sum.prec4 sum.prec5 sum.prec6 sum.prec7 sum.prec8 sum.prec9 sum.prec10 sum.prec11 sum.prec12
1  1277.016  1281.993  1169.365  934.7251  500.5550  179.5607   95.9675  161.3256  441.1229   679.4379   1040.139   1065.075
2  2093.158  2136.923  1951.037 1653.9073  754.9750  251.3503  246.5819  420.4114  759.1218  1381.7677   2011.484   2115.916
3  3075.961  3042.831  2786.772 2375.2007  991.3964  264.9034  230.0675  489.6361 1102.3470  1849.4146   2801.739   2948.129
4  2921.722  2902.637  2500.209 1963.0975 1141.3306  519.9777  351.8989  412.0550 1001.0112  1618.1831   2116.434   2657.462
5  1442.984  1397.344  1177.999  899.9659  490.6928  216.4971  189.7692  209.0536  498.1832   845.2963   1060.703   1319.866
6  5597.812  5827.785  6691.600 5221.7266 3225.3943 1812.6852 1093.6027 1578.7748 2690.3335  4121.0293   5114.721   5815.449
> head(x2)
  sum.prec1 sum.prec2 sum.prec3 sum.prec4 sum.prec5 sum.prec6 sum.prec7 sum.prec8 sum.prec9 sum.prec10 sum.prec11 sum.prec12
1  1277.016  1281.993  1169.365  934.7251  500.5550  179.5607   95.9675  161.3256  441.1229   679.4379   1040.139   1065.075
2  2093.158  2136.923  1951.037 1653.9073  754.9750  251.3503  246.5819  420.4114  759.1218  1381.7677   2011.484   2115.916
3  3075.961  3042.831  2786.772 2375.2007  991.3964  264.9034  230.0675  489.6361 1102.3470  1849.4146   2801.739   2948.129
4  2921.722  2902.637  2500.209 1963.0975 1141.3306  519.9777  351.8989  412.0550 1001.0112  1618.1831   2116.434   2657.462
5  1442.984  1397.344  1177.999  899.9659  490.6928  216.4971  189.7692  209.0536  498.1832   845.2963   1060.703   1319.866
6  5597.812  5827.785  6691.600 5221.7266 3225.3943 1812.6852 1093.6027 1578.7748 2690.3335  4121.0293   5114.721   5815.449

On Windows I am encountering problems on the previous example:

 Evaluation error: Null external pointer

This is probably due to the impossibility to export an external pointer to clusters (C methods used in CPP_stats and CPP_exact_extract, I suppose). I am not an expert of that, so I do not know how bypassing it (I do not know why the example used yesterday was working). If you think the parallelisation under Unix systems could be useful, I can edit the pull request to switch to single core mode on Windows.

dbaston commented 4 years ago

Your example works for me. But if I write prec to disk and read it back (instead of bringing it in with getData, then I get the same error.

brazil <- st_as_sf(getData('GADM', country='BRA', level=2))
prec <- getData('worldclim', var='prec', res=10)[[12]]

writeRaster(prec, '/tmp/prec.tif')
prec <- raster('/tmp/prec.tif')

system.time(x1 <- exact_extract(prec, brazil, 'sum', parallel=FALSE))
system.time(x2 <- exact_extract(prec, brazil, 'sum', parallel=TRUE))

Here's mine

 V ── Loaded and on-disk version mismatch.
 P ── Loaded and on-disk path mismatch.
ranghetti commented 4 years ago

This is another situation which I did not encounter (I used in-memory rasters for tests). I need time to investigate if a solution can be found; in case I will find it, I will reopen the pull request. Sorry I wasted your time.