rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
539 stars 89 forks source link

Run in parallel? #36

Open liushlcn opened 4 years ago

liushlcn commented 4 years ago

Is it possible to run in parallel? I have tried to run in parallel for calculation, I faced with the error message "NULL value passed as symbol address"

rhijmans commented 4 years ago

I think you cannot send SpatRaster or SpatVector objects to nodes as these objects cannot be serialized. So you need to to create the objects in the nodes. My plan it to make this all easy by providing built-in support for parallelization but that is not the highest priority right now; but I expect to add that this year.

hrvg commented 4 years ago

I have been running some code in parallel using a combination of both terra and raster packages: pre- and post- processing with terra and parallel compute with raster. The whole code would run faster if it was fully on terra and I am adding my voice to the wish for addition of built-in parallelization support.

rhijmans commented 4 years ago

Which terra functions would be your priority?

hrvg commented 4 years ago

I am mainly using terra::aggregate(fun = mean), terra::aggregate(fun = sd) and terra::crop() to derive statistical roughness metrics. I have worked around using raster, foreach and doParallel.

clairedavies commented 3 years ago

I seem to be having the same issue with terra::predict not working in parallel, same error of NULL value passed as symbol address, runs fine otherwise. Built in support for parallelisation of the predict function would be awesome if you could.

rhijmans commented 3 years ago

See https://github.com/rspatial/terra/issues/96 for built in support for predict.

You can use also pack and unpack to make serializable SpatRaster and SpatVector objects (may not work for more complex structures, but I think it works for the basic ones

library(terra)
s <- rast(system.file("ex/logo.tif", package="terra"))   
p <- pack(s)
p
#[1] "This is a PackedSpatRaster object. Use 'rast()' to unpack it"

# send somewhere and unpack
x <- rast(p)
x
#class       : SpatRaster 
#dimensions  : 77, 101, 3  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#source      : memory 
#names       : lyr.1, lyr.2, lyr.3 
#min values  :     0,     0,     0 
#max values  :   255,   255,   255 

This will load all values into memory, and I am thinking of a method that will do automatic tiling.

if you have multiple files, you can of course parallelize over filenames (create the objects on the cores)

The C++ level support will not come very soon --- perhaps sometime next year. In part because this requires C++14 and I am not sure if R is quite there yet.

clairedavies commented 3 years ago

Awesome thanks, will give that a go,

From: Robert Hijmans notifications@github.com Sent: Tuesday, 8 December 2020 3:40 PM To: rspatial/terra terra@noreply.github.com Cc: Davies, Claire (O&A, Hobart) Claire.Davies@csiro.au; Comment comment@noreply.github.com Subject: Re: [rspatial/terra] Run in parallel? (#36)

See #96https://github.com/rspatial/terra/issues/96 for built in support for predict.

You can use also pack and unpack to make serializable SpatRaster and SpatVector objects (may not work for more complex structures, but I think it works for the basic ones

library(terra)

s <- rast(system.file("ex/logo.tif", package="terra"))

p <- pack(p)

send somewhere

x <- unpack(p)

This will load all values into memory, and I am thinking of a method that will do automatic tiling.

if you have multiple files, you can of course parallelize over filenames (create the objects on the cores)

The C++ level support will not come very soon --- perhaps sometime next year. In part because this requires C++14 and I am not sure if R is quite there yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rspatial/terra/issues/36#issuecomment-740370734, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFBCV5W2GQEWUTXCLGHABY3STWU25ANCNFSM4NERKWOQ.

rhijmans commented 3 years ago

more examples for parallelization of terra::predict: https://github.com/rspatial/terra/issues/178

rCarto commented 3 years ago

For future reference pack() has been renamed to wrap() since version 1.2-5. https://rspatial.github.io/terra/reference/wrap.html

chrishaak commented 2 years ago

I'd like to add "extract" to the wish-list for future parallelization;

In the past, using raster::extract and "mcapply" (from the parallel library), I could extract values for a vector of coordinates (mycoords), each from a specified layer of a raster stack (as defined in the vector mylayers) across many cores.

myoutput<-unlist(mclapply(1:length(mycoords), function(i){extract(mybrick, mycoords[i], layer=mylayers[i], nl=1)}, mc.cores=ncores))

While executed on a single core, "terra::extract" represents a big speed gain over raster::extract, but it still gets bogged down when you have a lot of layers (i.e., 10000+).... making the parallelized "raster::extract" with mclapply considerably faster (assuming the use of several cores). Perhaps there is a similar way to approach parallelization for terra::extract?

xbailleau commented 2 years ago

I have written a satellite processing chain using raster package and parallelized the process to run a crop model over many fields, I have updated all the processes to be compatible with terra package, and damned my processes do not work now in parallel mode. Anyway I lost a lot of time but should have search before about the fact that spatRaster do not appreciate parallel mode, I have now to go back to raster package because my PC has 24 cores and even if terra is faster, 24 cores will be more efficient.

rhijmans commented 2 years ago

It is difficult to comment since I do not know how you implemented this. But in your case I do not see much of a problem. For example, you can send a chunk of weather data (matrix or data.frame) to each core and get the model results back. For example by parallelizing the function you use with app.

rhijmans commented 2 years ago

More here: https://stackoverflow.com/a/67449818/635245

xbailleau commented 2 years ago

Thank you very much for your answer and for sharing your packages, in the different cores, I do all the process (reading the jp2 files, calculate LAI, assimilate LAI in the crop model and generate yield map) using 24 cores, I can run 24 fields at once. I have now two possibilities (using raster and terra) I now add the way to use stars package. May be there would exist an other way to process but the code is built like that and runs fine using raster package, rebuild all the process would be quite lot of time to spent. I will check your link to stackflow, thx

aloboa commented 2 years ago

It would be great having some examples of running terra in parallel for those functions in which it is possible already. Perhaps even some examples in https://rspatial.org/terra/spatial/index.html

ailich commented 2 years ago

I would also like to add focal and focalCpp to the wishlist since focal operations can be computationally expensive (and I use them a lot 😅).

bitbacchus commented 2 years ago

I would also like to add focal and focalCpp to the wishlist since focal operations can be computationally expensive (and I use them a lot sweat_smile).

This would be very beneficial!

aloboa commented 2 years ago

and zonal()

SimonDedman commented 1 year ago

I'd like to throw distance into the ring as well - feels like something that should be parallel by design, as it seems like the underlying processing lends itself to being embarassingly parallel? (compute the same distance calculation on every pixel one by one)

dmi3kno commented 1 year ago

I want to suggest the default value for cores=getOption("mc.cores"). This is pretty standard in the parallel package and elsewhere in the R ecosystem.

ailich commented 11 months ago

I tried breaking rasters into chunks and using a parallelized version of lapply from the future.apply package. The idea here is breaking the raster into chunks and sending each chunk to a core, and then merging the results back together into a single object. The code works in serial with a standard lapply but I get an error in parallel. Based on the error message it seems like the it is potentially related to SpatRasters being C++ pointers to the data. Would it be possible to get something like this working, or is there any guidance on how to implement parallel processing in terra?

library(terra)
#> terra 1.7.47
library(future)
library(future.apply)
library(dplyr)

# Set up
set.seed(5)
r<-rast(matrix(data = sample(1:100, size = 1000, replace = TRUE), nrow = 100, ncol=100))
ncores<- 5
w<- c(3,3)
plot(r)


# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(start = ceiling(seq(1, nrow(r), length.out = ncores+1)))
breaks<- breaks %>% mutate(end=lead(start, n=1)+buffer)
breaks$end[breaks$end > nrow(r)]<- nrow(r)
breaks<- breaks[-nrow(breaks),]

# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
  r_list[[i]]<- r[breaks$start[i]:breaks$end[i], ,drop=FALSE]
}

# In serial
res_list<- lapply(r_list, FUN = focal, fun= "mean", na.rm=FALSE) #Apply focal to each list element
res<- do.call(terra::merge, res_list) #Merge back into single raster
plot(res)


#In Parallel
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list2<- future_lapply(r_list, FUN = focal, fun= "mean", w=w, na.rm=FALSE)
#> Error in .External(structure(list(name = "CppMethod__invoke_notvoid", : NULL value passed as symbol address

Created on 2023-11-14 with reprex v2.0.2

ailich commented 11 months ago

I saw that earlier in the thread wrap and unwrap were mentioned. I can get parallelization working if I wrap each chunk before sending it out to a core and wrapping the output from each core. Then those need to each be unwrapped and merged. Here's an example with focal

library(terra)
#> terra 1.7.62
library(future)
library(future.apply)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#> 
#>     intersect, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Custom Function
combine_raster_chunks<- function(x, buffer){
  for (i in 1:length(x)) {
    if(i!=1){x[[i]]<- x[[i]][(buffer+1):nrow(x[[i]]), , , drop=FALSE]} #Trim top of chunk
    if(i!=length(x)){x[[i]]<- x[[i]][1:(nrow(x[[i]])-buffer), , , drop=FALSE]} #Trim bottom of chunk
  }
  out<- do.call(merge, x)
  return(out)
}

# Set up
set.seed(5)
r<-rast(matrix(data = sample(1:100, size = 1000, replace = TRUE), nrow = 100, ncol=100))
ncores<- 5
w<- c(7,7)
plot(r)


# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(write_start = ceiling(seq(1, nrow(r)+1, length.out = ncores+1)))
breaks<- breaks %>% mutate(write_end=lead(write_start, n=1)-1)
breaks<- breaks %>% mutate(chunk_start=write_start - buffer)
breaks<- breaks %>% mutate(chunk_end = write_end + buffer)
breaks<- breaks[-nrow(breaks),]
breaks$chunk_end[breaks$chunk_end > nrow(r)]<- nrow(r)
breaks$chunk_start[breaks$chunk_start < 1]<- 1

# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
  r_list[[i]]<- wrap(r[breaks$chunk_start[i]:breaks$chunk_end[i], ,drop=FALSE])
} #SpatRasters need to be wrapped before sending out to different cores

focal_parallel<- function(x, ...){
  wrap(focal(unwrap(x),...)) #unwrap for processing and then wrap output
}

#In Parallel
plan(strategy = "multisession", workers=ncores) #Set up parallel
out_list<- future_lapply(r_list, FUN = focal_parallel, fun= "mean", w=w, na.rm=TRUE)
plan(strategy = "sequential")
out_list<- lapply(out_list, unwrap) #unwrap chunks
parallel<- combine_raster_chunks(out_list, buffer) #merge chunks into single raster
plot(parallel)


#In serial
serial<- focal(r, w=w, fun="mean")

#Comparison
serial-parallel
#> class       : SpatRaster 
#> dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source(s)   : memory
#> name        : focal_mean 
#> min value   :          0 
#> max value   :          0

Created on 2023-12-08 with reprex v2.0.2

ratnanil commented 3 months ago

I would also like to add focal and focalCpp to the wishlist since focal operations can be computationally expensive (and I use them a lot 😅).

focal always hits me like a brick wall... So yes, seconding to adding focal to the wish list!