ipeaGIT / r5r

https://ipeagit.github.io/r5r/
Other
178 stars 27 forks source link

r5r waiting for return from r5 but r5 seems to be finished with the calculations #198

Closed SRN1973 closed 2 years ago

SRN1973 commented 3 years ago

Today I encountered following odd behaviour (example code see below):

When I execute the detailed_itineraries - function inside mclapply in parallel (mc.cores = 2 or more), after returning some results (sometimes less, sometimes more) the whole process gets stuck. As far as I can see r5 in the background is seemingly finished with the calculations, however r5r seems to wait for something that is never returned. As a result when exectuting it in R-Studio the process is left in a state that never finishes (never returns the prompt although no processes are active anymore).

1 out of 1 origins processed... DONE! Consolidating results... DONE! Hangs on in this state forever

When I execute the same function inside mclapply in a sequential mode (mc.cores = 1) everything seems to work as expected.

1 out of 1 origins processed... DONE! Consolidating results... DONE! 1 out of 1 origins processed... DONE! Consolidating results... DONE!


CODE TO REPRODUCE THE BEHAVIOUR (based on date included in the r5r-package)

options(java.parameters = "-Xmx10G")

1) build transport network, pointing to the path where OSM and GTFS data are stored

library(r5r) #0.6.0 library(doParallel)
library(R.utils) library(future)

print("read network...")

path <- system.file("extdata/poa", package = "r5r") r5r_core <- setup_r5(data_path = path, verbose = FALSE)

2) load origin/destination points and set arguments

print("read points...")

points <- read.csv(system.file("extdata/poa/poa_hexgrid.csv", package = "r5r"))

3) set some parameters

mode <- c("WALK", "BUS") max_walk_dist <- 3000 # meters max_trip_duration <- 60 # minutes departure_datetime <- as.POSIXct("13-05-2019 14:00:00",format = "%d-%m-%Y %H:%M:%S")

4) restructure the points (make a from - to - point data.frame

a= points[1:613,] b= points[614:1226,]

names(a) <- c("id_a" , "lon_a" , "lat_a" , "population_a", "schools_a" ) names(b) <- c("id_b" , "lon_b" , "lat_b" , "population_b", "schools_b" )

points_new <- cbind(a,b)

4) public-transport accessibility request via detailed_itineraries-Function

######################################################

in my setup the error occurs if I feed the points 458 to 459 to the function

if i feed 1:458 the calculation is ok

if i feed 1:459 the calculation gets "stuck"

#####################################################

result_total <- NULL result_total <- mclapply(458:459, function(i) { fromPoints <- points_new[i,c("id_a","lon_a","lat_a","population_a","schools_a")] toPoints <- points_new[i,c("id_b","lon_b","lat_b","population_b","schools_b")] names(fromPoints)<-c("id","lon","lat","population","schools") names(toPoints)<-c("id","lon","lat","population","schools")

                             fromPoints$id <- as.character(fromPoints$id,stringsAsFactors = FALSE)
                             toPoints$id <- as.character(toPoints$id,stringsAsFactors = FALSE)

                             result <- r5r::detailed_itineraries(r5r_core = r5r_core,
                                                                 origins = fromPoints,
                                                                 destinations =toPoints,
                                                                 mode = c("WALK", "TRANSIT"),
                                                                 departure_datetime = as.POSIXct("13-05-2019 14:00:00",format = "%d-%m-%Y %H:%M:%S"),
                                                                 max_walk_dist = 1170,
                                                                 max_trip_duration = 60,
                                                                 shortest_path= TRUE,
                                                                 drop_geometry =      TRUE,
                                                                 verbose = FALSE,
                                                                 max_rides = 3,
                                                                 n_threads=1,
                                                                 progress=FALSE
                             )#eo detailed_itineraries

                             return(result)

                             }, mc.cores = 2) #eo function/mclapply    ###if mc.cores is 1 the calculation finishes as expected if it is > 1 the error occurs

print(result_total)


Operating System

Ubuntu 20.04.1 LTS RAM: 1 TB 120 cernels


sessionInfo()

R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] future_1.21.0 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 doParallel_1.0.16 iterators_1.0.13 foreach_1.5.1
[8] r5r_0.6.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.7 pillar_1.4.7 compiler_3.6.3 class_7.3-17 tools_3.6.3 digest_0.6.27 checkmate_2.0.0
[8] lifecycle_0.2.0 tibble_3.0.4 pkgconfig_2.0.3 rlang_0.4.10 jdx_0.1.4 DBI_1.1.1 rstudioapi_0.13
[15] curl_4.3.2 rJava_1.0-4 e1071_1.7-8 httr_1.4.2 dplyr_1.0.3 globals_0.14.0 generics_0.1.0
[22] vctrs_0.3.6 classInt_0.4-3 grid_3.6.3 tidyselect_1.1.0 glue_1.4.2 data.table_1.14.0 listenv_0.8.0
[29] sf_1.0-2 R6_2.5.1 parallelly_1.23.0 purrr_0.3.4 magrittr_2.0.1 backports_1.2.1 codetools_0.2-18
[36] ellipsis_0.3.1 units_0.7-2 KernSmooth_2.23-18 proxy_0.4-26 crayon_1.3.4

rafapereirabr commented 3 years ago

Hi @SRN1973. All computations in the routing and accessibility functions in r5r already run in parallel within the R5 Java engine. Having said that, it seems to me that the problem might be caused because you are calling r5r multiples times in parallel while r5r is already running in parallel.

mvpsaraiva commented 3 years ago

As @rafapereirabr said, r5r was designed to handle all parallelisation from the Java engine. When you add another layer of parallelism on top of that, it completely brakes that design and we cannot debug or predict the consequences of that. That's why mclapply only works with mc.cores = 1.

I'm thinking on a way to implement the timeout parameter from issue #196, which I think may solve some of the problems you're having.

mvpsaraiva commented 2 years ago

I'm closing this issue now, since the problem here seems to be 'overparallelization'. @SRN1973, please let us know if you're still having problems.