Closed ajdennhardt closed 4 years ago
Hi Andrew,
I think @mhesselbarth can say more about this, but as landscapemetrics
is written in a tidy way, I believe the future
packages would be a good match.
Anyway, I have two points:
Best, Sebastian
Hi Sebastian @bitbacchus ,
Thanks for reaching out today. I also did not know that the future
package could potentially improve the processing time, and I am curious to learn more about such specifications. I am relatively new to parallel computing in R. It's also good to know about the tidy coding behind the package, and so thank you for that clarification as well.
To answer your questions: 1) I have tried both approaches (local PC and on-campus HPC), and I have been relying more on the HPC as of late. I believe that the scheduler in use is SLURM.
2) Here is a snippet of code from one of my recent HPC submissions (e.g., I use a separate .sbatch procedure for submitting this job via SLURM):
##################################
## R Settings ##
##################################
#Packages and library calls
library(landscapemetrics)
library(rgeos)
library(rgdal)
library(raster)
library(snow)
library(sp)
set.seed(999)
#Working directory on HPC
homedir <- "/mnt/home/user/Documents/LSM-2020/"
setwd(homedir)
#Global option: prohibit printing output values in terms of scientific notation
options(scipen=999)
#Get today's date for use with workspace save later
datestr <- format(Sys.time(), "%Y-%m-%d")
##################################
## Preliminary Analysis ##
##################################
nt <- 7 #Number of cores requested
#Load geographic data
rasterPath <- "/mnt/home/user/Documents/LSM-2020/CEC 2015/NA_NALCMS_2015_LC_30m_LAEA_mmu5pix_.tif" #Land-use land-cover dataset
r <- raster(rasterPath) #Raster spans North America
pts <- readRDS("/mnt/home/user/Documents/LSM-2020/pts.rds") #>2,000 landscape centroids in US and CAN
#For choosing metrics, see https://r-spatialecology.github.io/landscapemetrics/reference/list_lsm.html
metricsAll <- list_lsm(level="landscape") #View available metrics for preliminary selection
as.data.frame(metricsAll[(metricsAll$metric=="area_mn") | (metricsAll$metric=="contag") |
(metricsAll$metric=="ed") | (metricsAll$metric=="enn_mn") |
(metricsAll$metric=="np") | (metricsAll$metric=="pd") |
(metricsAll$metric=="pr") | (metricsAll$metric=="shdi"), "metric"])
listMet <- metricsAll[(metricsAll$metric=="area_mn") | (metricsAll$metric=="contag") |
(metricsAll$metric=="ed") | (metricsAll$metric=="enn_mn") |
(metricsAll$metric=="np") | (metricsAll$metric=="pd") |
(metricsAll$metric=="pr") | (metricsAll$metric=="shdi"), ]
beginCluster(n=nt) #Assuming raster::cluster( ) functions can help optimize the next call
all.class <- sample_lsm(y=pts, metric=listMet$metric, level="landscape", landscape=r, shape="circle", size=39428.93, plot_id = pts$Centroids) #Compute 8 metrics between >2,000 different landscapes that correspond to survey area locations
endCluster() #Close use of the cluster
warnings()
all.class <- as.data.frame(all.class) #Convert result to classic data frame
################################################################################
#Save workspace data
filename <- paste('HPCC_results_', datestr, '.RData', sep = '')
filepath <- paste(homedir, filename, sep = '')
save(list = ls(all.names = TRUE), file = filepath, envir = .GlobalEnv)
################################################################################
Please let me know if there is anything that I can clarify above.
Best, Andrew
Hey Andrew,
There is no build-in parallelization in landscapemetrics
. However, it should be quite easy to program that yourself. I would basically create a grid that covers your whole landscape (see e.g. the first part of this blog post). Once you have setup the grid, you could easily loop through that grid parallelized, extract the "local landscape" (using raster::crop()
and raster::mask()
) within the current grid cell and calculate the metrics using e.g. the furrr
package.
Cheers, Max
Hi Max @mhesselbarth ,
Thanks for weighing in as well. I sincerely appreciate the informative blog post you've shared; however, I should reiterate that I am still very new to parallel computing in R, and so (based on your recent suggestions) the one thing that I am missing is the how behind (1) looping over the larger continental grid, (2) extracting the local landscapes one-at-a-time in that loop, (3) computing the desired metrics, and (4) saving the estimated metrics for each local landscape into one data frame, all in parallel, especially using the furrr
package.
Is it feasible to provide a small example in code, which demonstrates what you've suggested, especially looping and using functionality in the furrr
package? Recall that I have two spatial objects: a continental raster
and a SpatialPointsDataFrame
of local landscape centroids.
Thank you for your continued support. I sincerely appreciate your time.
In the meantime, please note that I will also be working toward implementing a similar solution in my own R programs, and if I happen to identify one, then I will be sure to post it to this thread for the benefit of others.
Best, Andrew
This should do the trick:
library(landscapemetrics)
library(raster)
library(sf)
library(sp)
library(future)
library(furrr)
# load data
data("augusta_nlcd")
my_raster <- augusta_nlcd
# create grid as example. This could be any sp or sf object to loop through
my_grid <- sf::st_make_grid(st_as_sfc(sf::st_bbox(my_raster)), n = c(10, 10))
my_grid <- sf::st_sf(geom = my_grid)
# set to multisession; see ?plan for details
future::plan(multisession)
# loop through each of the large grid cells and calculate metrics for grid cells
result <- furrr::future_map(1:nrow(my_grid), function(i) {
# crop sample plot
augusta_nlcd_temp <- raster::crop(x = augusta_nlcd,
y = sf::as_Spatial(my_grid[i, ]))
# mask sample plot
augusta_nlcd_temp <- raster::mask(x = augusta_nlcd_temp,
mask = sf::as_Spatial(my_grid[i, ]))
# calculate desired metrics
landscapemetrics::calculate_lsm(landscape = augusta_nlcd_temp,
what = c("lsm_p_area", "lsm_c_ca", "lsm_l_ta"))
})
Cheers, Max
Hi Max @mhesselbarth ,
Thank you very much for providing the example code above. Your demo has helped me understand the problem quite a bit more. Below, I have adapted your code to more closely match my purposes (e.g., using spatial points rather than a grid), and I look forward to testing it on our HPC soon.
# for metrics and data manipulation
library(landscapemetrics)
library(raster)
library(sf)
library(sp)
library(dplyr)
# for parallel processing
library(future)
library(furrr)
# load the data
r <- raster("/mnt/home/user/Documents/LSM-2020/CEC 2015/NA_NALCMS_2015_LC_30m_LAEA_mmu5pix_.tif")
pts <- readRDS("/mnt/home/user/Documents/LSM-2020/pts.rds")
# create polygon buffers around local-landscape centroids and use as replacement for my_grid (see above)
my_pts <- sf::st_buffer(st_as_sfc(pts), 39428.93)
# set to multisession; see ?plan for details
future::plan(multisession)
# loop through each of the polygon buffers around local-landscape centroids and calculate metrics for them
my_metric <- furrr::future_map(1:nrow(my_pts), function(i) {
# crop sample plot
my_raster_temp <- raster::crop(x = my_raster,
y = sf::as_Spatial(my_pts[i, ]))
# mask sample plot
my_raster_temp <- raster::mask(x = my_raster_temp,
mask = sf::as_Spatial(my_pts[i, ]))
# calculate desired metrics
landscapemetrics::calculate_lsm(landscape = my_raster_temp, level="landscape",
metric = "ent")
})
warnings()
my_metric
# merging results; see https://nowosad.github.io/post/lsm-bp2/
my_result = bind_cols(my_pts, my_metric)
plot(my_result["value"])
Hoping for the best, Andrew
Looks good. You should be actually able to use future_map_dfr()
I think which allows you to skip the bind_cols()
function. If you want to use a HPC the clustermq
package might be also of interest to you. If your HPC uses SLURM, this homepage might help you as well.
Will close this for now.
Hello,
Is there an approach currently available within the
landscapemetrics
package to run thesample_lsm( )
function in parallel, using multiple processors to help reduce total computation time? I am using R v. 4.0.2 and package v. 1.4.5 at the moment.I have a (single) land-use land-cover raster, spanning North America, within which I would like to summarize a handful of metrics for >2,000 locations (i.e. identified using landscape centroids) within a static circular buffer per landscape. I suspected that this process would take some time, particularly given the size of the landscapes (e.g., involving 39-km radial distances). Preliminary tests on my PC exhibited hours of computation without completion.
To date, bracketing my
sample_lsm( )
call withraster::beginCluster(n=ncores)
andraster::endCluster( )
does not seem to speed up the process.Does anyone have any suggestions for approaching this problem using the
landscapemetrics
package? I can certainly provide example code later on, especially if that is helpful.Thank you for your time and assistance in advance!
Best, Andrew