ipeaGIT / r5r

https://ipeagit.github.io/r5r/
Other
177 stars 26 forks source link

R crashes with realloc(): invalid next size error when feeding more than 3500 from/to points to detailed_itineraries-function #144

Closed SRN1973 closed 3 years ago

SRN1973 commented 3 years ago

After I had success to build the network for r5r for total Germany including the total GTFS-information for Germany I am able to run simple queries with r5r. However if I try to feed more than 3500 points to the detailed_itineraries - function (up to 3500 points the function works as expectd) my whole R-process crashes with one of the following error messages: (it seems as if this error occurs after the calculation is finished as it appears some seconds after all my cores have stopped working and are almost idle again)

  1. POSSIBLE ERROR MESSAGE

12:41:20.001 [main] INFO org.ipea.r5r.R5RCore - Loading R5NETWORK file saved by R5 version v6.0.1-2-gf44e585 commit f44e5850617cf563b8caf6709ec4bba342ff328d Using cached network.dat from /home/neumeier/b/Open_Trip_Planner/graphs/test_r5r//network.dat Warning message: In for (i in ic) value[[i]] <- as.vector(x[, i]) : closing unused connection 3 (https://www.ipea.gov.br/geobr/r5r/metadata.csv) Feb 04, 2021 12:43:54 PM org.hsqldb.persist.Logger logInfoEvent INFO: dataFileCache open start realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size caught segfault address 0x7f153d5d64b8, cause 'invalid permissions'

Traceback: 1: forderv(byval, sort = keyby, retGrp = TRUE) 2: [.data.table(path_options, , :=(total_duration, sum(segment_duration, wait)), by = .(fromId, toId, option)) 3: path_options[, :=(total_duration, sum(segment_duration, wait)), by = .(fromId, toId, option)] 4: detailed_itineraries(r5r_core = r5r_core, origins = fromPlace, destinations = zielPointsForStartPoint, mode = mode, mode_egress = "WALK", departure_datetime = departure_datetime, max_walk_dist = max_walk_dist, shortest_path = TRUE, verbose = FALSE, drop_geometry = TRUE, n_threads = Inf) 5: r5r_thuenen_detailed(fromPlace[1:5000, ], toPlace[1:5000, ], mode, departure_datetime, max_walk_dist, r5r_core) An irrecoverable exception occurred. R is aborting now ... Aborted (core dumped)

  1. POSSIBLE ALTERNATIVE ERROR MESSAGE

In for (i in ic) value[[i]] <- as.vector(x[, i]) : closing unused connection 3 (https://www.ipea.gov.br/geobr/r5r/metadata.csv) Feb 04, 2021 12:52:52 PM org.hsqldb.persist.Logger logInfoEvent INFO: dataFileCache open start realloc(): invalid next size realloc(): invalid next size realloc(): invalid next sizerealloc(): invalid next size

caught segfault address 0x300000003000, cause 'memory not mapped' Aborted (core dumped)

I would be glad about any hint how to prevent this error or its cause as in the end I intend to feed 28,569,115 points (to-from combinations) to the function in order to generate a public-transport-accessibility map for Germany...


Operating System: Ubuntu 20.04.1 LTS RAM: 1 TB 120 cernels

Input files:

germany-latest-osm.pbf (https://download.geofabrik.de/europe/germany-latest.osm.pbf) with droped relations as recommended here: https://github.com/ipeaGIT/r5r/issues/141 -GTFS files (whole public transport for Germany): https://download.gtfs.de/germany/fv_free/latest.zip https://download.gtfs.de/germany/nv_free/latest.zip https://download.gtfs.de/germany/rv_free/latest.zip

This is the part of my program code that causes the error stated above

accessibilityAnalysisResults <- NULL accessibilityAnalysisResults <- detailed_itineraries(r5r_core = r5r_core ,origins = fromPlace[1:5000,] ,destinations = toPlace[1:5000,] ,mode = c("WALK", "TRANSIT") ,mode_egress = "WALK"
,departure_datetime = as.POSIXct("20-10-2020 9:00:00",format = "%d-%m-%Y %H:%M:%S") ,max_walk_dist = Inf ,shortest_path = TRUE ,verbose = FALSE ,drop_geometry = TRUE ,n_threads = Inf )#eo detailed_itineraries

xtimbeau commented 3 years ago

Hi, I am doing something close for Ile de France. I send request by slice in order to limit the memory overload. There is a small cost in performance but it is acceptable as most of the time is spend calculating the itineraries.

Le jeu. 4 févr. 2021 à 13:21, SRN1973 notifications@github.com a écrit :

After I had success to build the network for r5r for total Germany including the total GTFS-information for Germany I am able to run simple queries with r5r. However if I try to feed more than 3500 points to the detailed_itineraries - function (up to 3500 points the function works as expectd) my whole R-process crashes with one of the following error messages: (it seems as if this error occurs after the calculation is finished as it appears some seconds after all my cores have stopped working and are almost idle again)

  1. POSSIBLE ERROR MESSAGE

12:41:20.001 [main] INFO org.ipea.r5r.R5RCore - Loading R5NETWORK file saved by R5 version v6.0.1-2-gf44e585 commit f44e5850617cf563b8caf6709ec4bba342ff328d Using cached network.dat from /home/neumeier/b/Open_Trip_Planner/graphs/test_r5r//network.dat Warning message: In for (i in ic) value[[i]] <- as.vector(x[, i]) : closing unused connection 3 ( https://www.ipea.gov.br/geobr/r5r/metadata.csv) Feb 04, 2021 12:43:54 PM org.hsqldb.persist.Logger logInfoEvent INFO: dataFileCache open start realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size realloc(): invalid next size caught segfault address 0x7f153d5d64b8, cause 'invalid permissions'

Traceback: 1: forderv(byval, sort = keyby, retGrp = TRUE) 2: [.data.table(path_options, , :=(total_duration, sum(segment_duration, wait)), by = .(fromId, toId, option)) 3: path_options[, :=(total_duration, sum(segment_duration, wait)), by = .(fromId, toId, option)] 4: detailed_itineraries(r5r_core = r5r_core, origins = fromPlace, destinations = zielPointsForStartPoint, mode = mode, mode_egress = "WALK", departure_datetime = departure_datetime, max_walk_dist = max_walk_dist, shortest_path = TRUE, verbose = FALSE, drop_geometry = TRUE, n_threads = Inf) 5: r5r_thuenen_detailed(fromPlace[1:5000, ], toPlace[1:5000, ], mode, departure_datetime, max_walk_dist, r5r_core) An irrecoverable exception occurred. R is aborting now ... Aborted (core dumped)

  1. POSSIBLE ALTERNATIVE ERROR MESSAGE

In for (i in ic) value[[i]] <- as.vector(x[, i]) : closing unused connection 3 ( https://www.ipea.gov.br/geobr/r5r/metadata.csv) Feb 04, 2021 12:52:52 PM org.hsqldb.persist.Logger logInfoEvent INFO: dataFileCache open start realloc(): invalid next size realloc(): invalid next size realloc(): invalid next sizerealloc(): invalid next size

caught segfault address 0x300000003000, cause 'memory not mapped' Aborted (core dumped)

I would be glad about any hint how to prevent this error or its cause as in the end I intend to feed 28,569,115 points (to-from combinations) to the function in order to generate a public-transport-accessibility map for Germany...

Operating System: Ubuntu 20.04.1 LTS RAM: 1 TB 120 cernels

Input files:

germany-latest-osm.pbf ( https://download.geofabrik.de/europe/germany-latest.osm.pbf) with droped relations as recommended here: #141 https://github.com/ipeaGIT/r5r/issues/141 -GTFS files (whole public transport for Germany): https://download.gtfs.de/germany/fv_free/latest.zip https://download.gtfs.de/germany/nv_free/latest.zip https://download.gtfs.de/germany/rv_free/latest.zip

This is the part of my program code that causes the error stated above

accessibilityAnalysisResults <- NULL accessibilityAnalysisResults <- detailed_itineraries(r5r_core = r5r_core ,origins = fromPlace[1:5000,] ,destinations = toPlace[1:5000,] ,mode = c("WALK", "TRANSIT") ,mode_egress = "WALK" ,departure_datetime = as.POSIXct("20-10-2020 9:00:00",format = "%d-%m-%Y %H:%M:%S") ,max_walk_dist = Inf ,shortest_path = TRUE ,verbose = FALSE ,drop_geometry = TRUE ,n_threads = Inf )#eo detailed_itineraries

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ipeaGIT/r5r/issues/144, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KEIEQX7BN4ELYU4ZU7LS5KGOHANCNFSM4XCXDZFQ .

SRN1973 commented 3 years ago

Thank you xtimbeau for your advice. I already tried to send the request by slice, however because of the time lag that occurs for every slice in the detailed_itineraries-function caused by the parallelisation overhead this is not an option for me.

As mentioned I do the analysis for total Germany based on an underlying 250 m x 250 m raster and in total I have 28,569,115 requests (in the moment). Performing this with the open Trip Planner takes around 6 days with the hardware described above...as such as this is a repeated task for me I am looking for something faster (similar to street routing with the osrm where around 57 million requests can be calculated in around 8 hours by parallelisation with perl and 16 hours by parallelisation with R)...

As far as I can see I do not run into any memory issues as after starting the process I have more of halve of my 1TB RAM left unused...

mvpsaraiva commented 3 years ago

Hi @SRN1973, thanks for using our package. It is really interesting to see how r5r performs in such a large use case. First, I'd try @xtimbeau suggestion and send the requests in slices, as memory use can escalate quickly and some errors may occur even before the available physical memory is full.

Second, do you really need detailed_itineraries() to calculate your accessibility map? If you're only looking for travel times and don't need the extra outputs of detailed_itineraries(), you can use travel_time_matrix() which is orders of magnitude faster.

Anyways, I'll try to replicate this error and get back to you ASAP.

SRN1973 commented 3 years ago

Thank you mvpsaraiva for your response. Unfortunately I need the detailed_itineraries function as I want to have the possibility to control the walking-times/distances to and from the public transport stations, as well as the waiting times etc. too...My intention is to model basic services accessibility as experienced by the people. As such I want to filter out results that represent trips nobody would chose (that is trips with long waiting times at the beginning or in between or trips with stations several kilometers away, etc.) . The travel_time_matrix unfortunately does not return this information.

mvpsaraiva commented 3 years ago

travel_time_matrix() allows for some control of the issues you've mentioned:

If you need more control than that, then I'm afraid detailed_itineraries() is your only option, although it was not intended for calculating that amount of itineraries at once.

xtimbeau commented 3 years ago

May be if you need only to filter out some itineraries with too much walking you can use max_walk_dist and to limit waiting time a proxy is max_rides that will limit transfers. Limiting waiting time directly seems like not possible, but probably is doesn't make sense when considering total travel time. Using travel_time_matrix is faster and returning less data and hence allows for larger requests. I managed to make r5r work with Azure Batch and R. Disregarding budget limits, there are no limits to the hardware you can access. Are you calculating all pairs on a 250m grid for Germany as a whole ? That seems huge, much more than 20 10^6, isn't it ?

Le jeu. 4 févr. 2021 à 14:15, SRN1973 notifications@github.com a écrit :

Thank you mvpsaraiva for your response. Unfortunately I need the detailed_itineraries function as I want to have the possibility to control the walking-times/distances to and from the public transport stations, as well as the waiting times etc. too...My intention is to model basic services accessibility as experienced by the people. As such I want to filter out results that represent trips nobody would chose (that is trips with long waiting times at the beginning or in between or trips with stations several kilometers away, etc.) . The travel_time_matrix unfortunately does not return this information.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipeaGIT/r5r/issues/144#issuecomment-773295109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KEJGI72TTYFURXPKRCDS5KMWVANCNFSM4XCXDZFQ .

SRN1973 commented 3 years ago

Thank you for your advice. @xtimbeau no it is not that much it is only 28,569,115 single trips. In orde to reduce the trips to be calculated I do not calculate all connections for all cells of the analysis raster to all other cells, but first, for every cell I search for the next 5 etc. locations of the basic services I am interested in via a knn-analysis (nearest points by euclidean distance) and in a second step I calculate the travel time to these five destinations. In a third step I filter out the "best/shortest" result. As I do this for all cells of my analysis raster I get very detailed accessibility results far below the administrative level of the communities (and as I am interested in differences within a community I decided agaist the more simple isochrones approach). What is debatable is wether the next five locations decision make sense here. Whereas such a limit is not too problematic in a street network it might distort the result for a public transport analysis as there the next locations by distance might not always be the next locations by public transport times...

However when I use the travel_time_matrix() approach...how do I feed the points? If I insert fromPlace[1:5, ] toPlace[1:5, ] assuming that I want the travel times for first point of fromPlace to first point of toPlace, then the second point of fromPlace to the second point of toPlace and so on. I get duplicated results like this...

fromId toId travel_time_p025 travel_time_p050 travel_time_p075 1: 1 1874 95 95 95 2: 1 318 82 82 82 3: 1 1874 95 95 95 4: 1 318 82 82 82 5: 1 1874 95 95 95 6: 1 318 82 82 82 7: 1 1874 95 95 95 8: 1 318 82 82 82 9: 1 1874 95 95 95 10: 1 318 82 82 82

xtimbeau commented 3 years ago

OK, I understand, limiting the number of pairs is of course the first thing to do. What I do myself to feed the travel_time_matrix request is that I put in fromId the origin points I want (where people live ) and in the toId the opportunities I want to calculate the accessibility to. If opportunities are located at the same place where people live, then I do the calculation twice. That could be avoided, but if the number of opportunities is low, then, it doesn't matter.

I select origins in slices (not to large), close together and opportunities are not all opportunities but a selection of opportunities likely to be accessed from the selection of origins (hence the need to have clusters of origins in the first place). I select the opportunities based on an heuristic (I sample a few origins, calculate all distances to opportunities and remove opportunities too far to be reached by going on foot to the sampled origins from all origins). It may seem unclear, but we a diagram it's quite simple. fromId = Cluster of origins, toId = selection of opportunities, selection of opportunities done by approximating distance between the cluster and the opportunities). The clusters are the base for the slicing of the grid. This allows to control the load of R5 and in my experience, there is a point where the loss in data exchanges between R and Java is lower than the time to calculate the routes in r5. At that point, expanding the size of the cluster is not increasing the pairs/s calculated. Hope this helps ! XT PS: I can calculate around 20k pairs per second on a 8cores/16threads, using 48Go Ram (for Java) and for a grid 50m by 50m over Ile de France, it takes around 1 day of computation. In the end there are more than 10^9 pairs calculated. OSRM is way faster, but not doing the same thing. The road network is denser so I think the algorithms are different. R5 gives more information by giving the montecarlo draws. I however reduce this to the minimum (1 draw per minute step of time_window), as it has some minor but positive consequences on the calculation time.

Le jeu. 4 févr. 2021 à 15:12, SRN1973 notifications@github.com a écrit :

Thank you for your advice. @xtimbeau https://github.com/xtimbeau no it is not that much it is only 28,569,115 single trips. In orde to reduce the trips to be calculated I do not calculate all connections for all cells of the analysis raster to all other cells, but first, for every cell I search for the next 5 etc. locations of the basic services I am interested in via a knn-analysis (nearest points by euclidean distance) and in a second step I calculate the travel time to these five destinations. In a third step I filter out the "best/shortest" result. As I do this for all cells of my analysis raster I get very detailed accessibility results far below the administrative level of the communities (and as I am interested in differences within a community I decided agaist the more simple isochrones approach). What is debatable is wether the next five locations decision make sense here. Whereas such a limit is not too problematic in a street network it might distort the result for a public transport analysis as there the next locations by distance might not always be the next locations by public transport times...

However when I use the travel_time_matrix() approach...how do I feed the points? If I insert fromPlace[1:5, ] toPlace[1:5, ] assuming that I want the travel times for first point of fromPlace to first point of toPlace, then the second point of fromPlace to the second point of toPlace and so on. I get duplicated results like this...

fromId toId travel_time_p025 travel_time_p050 travel_time_p075 1: 1 1874 95 95 95 2: 1 318 82 82 82 3: 1 1874 95 95 95 4: 1 318 82 82 82 5: 1 1874 95 95 95 6: 1 318 82 82 82 7: 1 1874 95 95 95 8: 1 318 82 82 82 9: 1 1874 95 95 95 10: 1 318 82 82 82

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipeaGIT/r5r/issues/144#issuecomment-773329898, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KEPXLEB5A3MMADRQXZDS5KTOFANCNFSM4XCXDZFQ .

SRN1973 commented 3 years ago

I tried it with segmentation but nevertheless I encounter the same error... Somewhere between the 9721 and 9725 distance calculation the process ends with following error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.util.concurrent.ExecutionException: java.lang.IllegalStateException: java.lang.IllegalStateException: java.lang.IllegalStateException: No valid itineraries found for path computed in RaptorWorker In addition: Warning messages: 1: In assert_points_input(origins, "origins") : 'origins$id' forcefully cast to character. 2: In assert_points_input(destinations, "destinations") : 'destinations$id' forcefully cast to character. Called from: .jcheck(silent = FALSE)

Below is my code:

fromPlaceTest <- fromPlace[1:10000,] toPlaceTest <- toPlace[1:10000,]

print(paste0("Starte Berechnung Erreichbarkeitsanalyse: ",Sys.time())) start <- Sys.time()

accessibilityAnalysisResults <- NULL

chunkSize <- 5 chunkStart <- 1 chunkEnd <- chunkSize while(chunkStart <= nrow(fromPlaceTest)) {

tmpResults <- NULL if(chunkEnd <= nrow(fromPlaceTest)){ print(paste0(chunkStart,"---->",chunkEnd," of ",nrow(fromPlaceTest))) tmpResults <- r5r_thuenen_detailed(fromPlace[chunkStart:chunkEnd,],toPlace[chunkStart:chunkEnd,],mode,departure_datetime,max_walk_dist,r5r_core)

} else { chunkEnd <- nrow(fromPlaceTest) print(paste0("last chunk: ",chunkStart,"---->",chunkEnd)) tmpResults <- r5r_thuenen_detailed(fromPlace[chunkStart:chunkEnd,],toPlace[chunkStart:chunkEnd,],mode,departure_datetime,max_walk_dist,r5r_core) }#eo if else

print(head(tmpResults))

chunkEnd <- chunkEnd+chunkSize chunkStart <- chunkEnd - chunkSize + 1 accessibilityAnalysisResults <- rbind(accessibilityAnalysisResults, tmpResults) }#eo while

print(paste0("Beende Berechnung. Rechenzeit gesamt: ",Sys.time()-start))

mvpsaraiva commented 3 years ago

This is interesting.... the message

java.lang.IllegalStateException: No valid itineraries found for path computed in RaptorWorker

indicates there is no path between some particular origin and destination, but this shouldn't halt the entire process. Now I have a better clue on where to look and how to fix this issue. I'll keep you posted.

SRN1973 commented 3 years ago

...I experimented a bit with the parameters. What is interesting if I put the detailed_itineraries-Function in a try() expression it seems to not stop at the described error...at least not at once. In addition I realized, that trh process is more likely to handle more of the calculations if I decreas the nr_threads drastically from my 120 available cores to only 10 for example. What happens is that with lower threads more calculations are performed befor the R-process crashes as described...

xtimbeau commented 3 years ago

In my experience, the memory consumption of r5 is directly linked to the number of threads. The rule of thumb I apply is 4 to 6go per thread and for a network.dat around 200 Mo.

Le ven. 5 févr. 2021 à 13:15, SRN1973 notifications@github.com a écrit :

...I experimented a bit with the parameters. What is interesting if I put the detailed_itineraries-Function in a try() expression it seems to not stop at the described error...at least not at once. In addition I realized, that trh process is more likely to handle more of the calculations if I decreas the nr_threads drastically from my 120 available cores to only 10 for example. What happens is that with lower threads more calculations are performed befor the R-process crashes as described...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipeaGIT/r5r/issues/144#issuecomment-773997304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KELU4FSAB2NS6U2OCMLS5POMRANCNFSM4XCXDZFQ .

SRN1973 commented 3 years ago

Now, I think I kow what causes following error: "java.lang.IllegalStateException: No valid itineraries found for path computed in RaptorWorker"

After I experimented with the different routing option values I realized that this error seems to be connected with the value passed to the detailed_itineraries function by the "daparture_datetime" parameter. That means, if I pass another daparture time then I the function detailed_itineraries works as expected.

e.g. in my example: departure_datetime <- as.POSIXct("20-10-2020 9:00:00",format = "%d-%m-%Y %H:%M:%S") - > leads to the error departure_datetime <- as.POSIXct("20-10-2020 9:10:00",format = "%d-%m-%Y %H:%M:%S") - > leads to the error departure_datetime <- as.POSIXct("20-10-2020 9:20:00",format = "%d-%m-%Y %H:%M:%S") - > leads to the error departure_datetime <- as.POSIXct("20-10-2020 9:30:00",format = "%d-%m-%Y %H:%M:%S") ->calculation is performed. departure_datetime <- as.POSIXct("20-10-2020 10:00:00",format = "%d-%m-%Y %H:%M:%S") ->calculation is performed.

...the problem with this mistacke is, that if such an error occurs in a bigger chunk of from - to calculations performed in parallel the whole chunk is aborted. As such it is not really possible to only omitt the problematic connection...and to fall back in a 1:1 calculation mode within a cycle is wayst too slow...

mvpsaraiva commented 3 years ago

Hi @SRN1973 I couldn't reproduce this error, but I've made an attempt to fix it anyways. I've added a try/exception statement to continue the process when there's an error in a single path. Try updating the package to the last version from github, and let me know if this bug is fixed or not.

SRN1973 commented 3 years ago

Thank you for your effort! I tried it with my data but unfortunately although the error seems to be fixed for some from-to paths others (I suspect that this is the Error/Message output of the fix: 12:17:59.320 [ForkJoinPool-3059-worker-215] ERROR org.ipea.r5r.R5RCore - Error while finding path between 8116 and 2799) still produce the error I described earlier...

So I fear, that different errors in the input dataset might cause the error at different stages in the calculation process...

However, I found a workaround for my specific case. That is, if this error occurs in a chunk of from-to calculations I capture the error in a trycatch() expression whereas in the error-function I increase the start time in a while cycel by 10 minutes. If either the calculation is successful then the result is returned. If it is not successful within a specific amount of such iterations I I switch from the chunk-calculation mode to an alternative calculation mode that performs the calculation within a cycle for every singel from-to combination within the chunk (within a try() expression and only add the result to the return variable if the calculation was successful...). With this I am able to stick to the chunk calculation mode for the greatest part of my calculations and to filter out the errorneous combinations. The downside of this approach is, that chunks greater than 1000 combinations will slow down the whole calculation process if there exist several errorneous combinations...

mvpsaraiva commented 3 years ago

ERROR org.ipea.r5r.R5RCore - Error while finding path between 8116 and 2799

That's the error message I've added. It identifies that the error happened when finding the path between origin id 8116 and destination id 2799. I was hoping it would report the error and continue the calculation, instead of crashing the whole process. Anyways, this seems to be an upstream bug in R5, which there's little we can do on our side to fix.

Going back a few messages, I was suggesting you could use travel_time_matrix instead of detailed_itineraries, which is much faster. You've asked:

However when I use the travel_time_matrix() approach...how do I feed the points? If I insert fromPlace[1:5, ] toPlace[1:5, ] assuming that I want the travel times for first point of fromPlace to first point of toPlace, then the second point of fromPlace to the second point of toPlace and so on.

The way travel_time matrix works is that it calculates travel times from all origins to all destinations. It doesn't work on a one-to-one basis like detailed itineraries. So it calculates travel times from 1st origin to all destinations, then from 2nd origin to all destinations, and so on.

If I understood correctly, you are only calculating travel times from each origin (I assume origins are all the 250m x 250m cells on the grid) to the 5 nearest points of interest? If that's the case, then there's not a single call of travel_time_matrix that will do the job with parallelisation done automatically by r5r.

I think you'll need one travel_time_matrix call for each cell, as: origins = cell destinations = 5 nearest neighbours

Since this call has only one origin, r5r cannot run it in parallel. You'll have to tavel_time_matrices many times in parallel using a package like furrr, for example.

Another approach is to use @xtimbeau method and slice the study area in smaller grids, then calculate travel times from all cells to all opportunities in that cluster (instead of just the 5 nearest).

rafapereirabr commented 3 years ago

Quick heads up. We have rencently implemented a small change to the detailed_itineraries function which make it much faster if the user is only interested in the shortest route (shortest_path = T). See #153. This change is alredy available in the dev version of r5r.

utils::remove.packages('r5r')
devtools::install_github("ipeaGIT/r5r", subdir = "r-package")
library(r5r)
ansoncfit commented 3 years ago

Another heads up: R5 v6.2 was released earlier this week. This version adds path capabilities to the main Raptor router (rather than the point-to-point router). See package com.conveyal.r5.transit.path and https://raw.githubusercontent.com/conveyal/docs/master/docs/guides/compute-freeform.mdx.

On our hosted computation cluster with automatic scaling (see https://conveyal.com/learn), we are now able to prepare detailed transit path results for millions of origin-destination pairs in a few minutes.

mvpsaraiva commented 3 years ago

That's great news, @ansoncfit. I'll check how to implement this new feature in r5r ASAP.

mvpsaraiva commented 3 years ago

Hi @SRN1973. I think we fixed this issue, and have discussed a lot via email. Thanks for your feedback. I'm closing this issue now, but feel free to open it again (or a new one) if you have further questions.