Closed rafapereirabr closed 4 years ago
@mvpsaraiva said these two parameters are working already and can be implemented in the R routing functions.
r5r_core$setTimeWindowSize(minutes)
r5r_core$setNumberOfMonteCarloDraws(n)
@mvpsaraiva will still implement this one:
r5r_core$setPercentiles(array inteiros)
I've implemented the setPercentiles()
function in the last jar. Here is an example of its usage:
# allocate RAM memory to Java
options(java.parameters = "-Xmx2G")
# build transport network, pointing to the path where OSM and GTFS data are stored
library(r5r)
library(tidyverse)
path <- system.file("extdata", package = "r5r")
r5r_core <- setup_r5(data_path = path, verbose = FALSE)
# load origin/destination points and set arguments
points <- read.csv(system.file("extdata/poa_hexgrid.csv", package = "r5r"))
poi <- read.csv(system.file("extdata/poa_points_of_interest.csv", package = "r5r"))
mode <- c("WALK", "BUS")
max_walk_dist <- 3000 # meters
max_trip_duration <- 60 # minutes
departure_datetime <- as.POSIXct("13-03-2019 14:00:00",
format = "%d-%m-%Y %H:%M:%S",
tz = "America/Sao_Paulo")
# set time window and percentiles
r5r_core$setTimeWindowSize(30L) # 30 minutes
r5r_core$setNumberOfMonteCarloDraws(120L) # 4 monte carlo draws (departures) per minute
r5r_core$setPercentiles(c(25L, 50L, 75L, 100L))
#' use:
#' r5r_core$setPercentiles(1L:100L)
#' to get all percentiles
# calculate a travel time matrix
ttm <- travel_time_matrix(r5r_core,
origins = poi,
destinations = points,
mode,
departure_datetime,
max_walk_dist,
max_trip_duration,
verbose = FALSE)
# plot results
ttm %>%
filter(fromId == "bus_central_station") %>%
left_join(points, by=c("toId"="id")) %>%
pivot_longer(cols=starts_with("travel_time"), names_to = "percentile", values_to = "travel_time") %>%
filter(travel_time <= max_trip_duration) %>%
ggplot(aes(x=lon, y=lat, colour=travel_time)) +
geom_point(data=points, aes(x=lon, y=lat), colour="grey85") +
geom_point() +
scale_colour_distiller(palette = "Spectral") +
coord_map() +
facet_wrap(~percentile)
I am not entirely sure how we should use the setNumberOfMonteCarloDraws
parameter here. What is the defaul number used in R5? Is there any R5 documentation on this? Perhaps we could pick up on @mattwigway's brain here.
I suggest we use the following default values:
setTimeWindowSize( )
Value equal to 0
. By default the function only considers a single departure time
setNumberOfMonteCarloDraws( )
Value equal to setTimeWindowSize()
. By default one departure per minute.
setPercentiles( )
Value vetor equal to c(25L, 50L, 75L, 100L)
Let me know what you think.
Just a few things to consider. The default for setTimeWindowSize( )
must be 1
, because R5 doesn't work with 0
minutes. However, if the time window is 1
minute, it doesn't make sense to set different percentiles because the function will return the same value for all of them. So, maybe setPercentiles()
default could be c(50)
, so if the user only remembers to set a larger time window but forgets about the percentiles, r5r
would return the median travel time for that window. Also, all current code would have the exact same behaviour as today, after the update, which is a bonus.
Regarding how to expose those parameters, are we just adding extra parameters to the travel_time_matrix()
function? I guess this would be the simplest option, and I think we don't have any new parameters to add after those in the near future.
Another thing to take note: values set using the setTimeWindowSize()
and setNumberOfMonteCarloDraws()
are persistent within the same r5r_core
object, and they also affect the detailed_itineraries()
function. Hence, if we want to keep the behaviour of detailed_itineraries()
unchanged, we need to call those functions within detailed_itineraries()
in order to restore default values before calling the Java functions (planSingleTrip()
or planMultipleTrips()
).
I think the default in R5 is 240 Monte Carlo draws, and I used to use 1000 for production-quality results, but those are just based on intuition. This only comes into play when you are using frequency-based routes anyhow; if you have only schedule-based routes the number of Monte Carlo draws won't affect the results, and in fact I think that if this is the case the number of Monte Carlo draws is just ignored. There more details on how the Monte Carlo results affect the results in this paper. In that paper we developed a bootstrapping algorithm to estimate the uncertainty due to the Monte Carlo process, although I don't think the bootstrapping part of R5 has been used in a while and I would honestly be surprised if it still works - but the results may be informative. You might also check with @ansoncfit about the value they are recommending for this parameter at Conveyal. In any case I wouldn't recommend setting this to 1, as that will mean that the accessibility results will be based on a single randomly-generated schedule for the frequency based routes which will likely be unstable.
The percentiles may not all be the same with a single departure time, if there are frequency based routes and there is more than one Monte Carlo draw.
I'd also argue for setting the default time window size to at least an hour—I really think that results are better when analyzed over a time window, and setting this parameter to do that by default will gently regularize this practice.
Also, just to correct the description of the issue, R5 uses travel times from later departures to bound the travel times at earlier departures, because a trip leaving at 10:00 cannot take longer than a trip leaving at 10:01 plus one minute - because you can always just wait at the origin and leave later.
Thank you both for your comments! I've created a time_window
branch where I've implemented the parameters as follows:
setTimeWindowSize( )
Value equal to 60, a 1-hour interval
setNumberOfMonteCarloDraws( )
No input. Using R5
defautl
setPercentiles( )
Value equal to c(50L, 50L), to get the median.
Even though these are the defaul values, the function is returning and error when the user does not pass these parameters explicitly (see reprex below).
library(r5r)
# build transport network
data_path <- system.file("extdata", package = "r5r")
r5r_core <- setup_r5(data_path = data_path)
# load origin/destination points
points <- read.csv(system.file("extdata/poa_hexgrid.csv", package = "r5r"))[1:5,]
mode <- c("WALK", "TRANSIT")
max_walk_dist <- Inf
max_trip_duration <- 120L
departure_datetime <- as.POSIXct("13-03-2019 14:00:00",
format = "%d-%m-%Y %H:%M:%S")
# estimate travel time matrix
travel_time_matrix(r5r_core,
origins = points,
destinations = points,
mode,
departure_datetime,
max_walk_dist,
max_trip_duration
# , time_window= 60
# , percentiles = c(50, 50)
)
The problem is that the order of the parameters has changed. So, effectively, you're passing the value of max_walk_dist
to time_window
, and max_trip_duration
to percentiles
. Apart from that, there is a bug in the Java code that doesn't accept a single percentile as input, and I guess that's why you've set c(50, 50)
as the default, instead of just 50
. I'll fix this in the next version of the jar.
The function works if I specify the other parameters like this:
# estimate travel time matrix
travel_time_matrix(r5r_core,
origins = points,
destinations = points,
mode = mode,
departure_datetime = departure_datetime,
max_walk_dist = max_walk_dist,
max_trip_duration = max_trip_duration
)
You are right about the c(50, 50)
. I will wait for you to update the JAR file to do a pull request.
See the guidance on number of draws here: https://docs.conveyal.com/analysis/configuration#simulated-schedules
We have a paper under review that characterizes the number of draws and convergence, I'll hope to share it soon.
Thanks for the link @ansoncfit . This is a good reference to point to in our documentation. Please let us know when you have the paper published so we can cite it in our documentation as well.
I've now added new parameters to the travel_time_matrx
fuction to allow for multiple estimates over a time window set by the user. See pull request #117.
For now, the number of Monte Carlo simulations is 200, and the user has no control over it. We could however add this as a parameter for users to change, it would only add a bit more complexity for the end users who, in general, are not familiar with this. This is my concern.
Following the discusson on issue #118 , I seeting the number of Monte Carlo Draws to 5 times the size of time_window
. A one-minute time window would use 5 draws, while a 60-minute window would use 300 draws and so on.
This capability has been implemented in the new version of r5r
v0.2.0 published on CRAN today. closing this issue for now.
One of the advantages os R5 is how it uses searches from previous departure times to bound the path searches of other departure times later. This gives a big boost to performance and allows us to estimave travel times departing every minute within a time window. There are various advantages of this approach from a sampling perspective (see some works by Marcin Stepniak, Andrew Owen and Steve Farber). I will just reproduce below the suggestion from Matthew Conway (one of the core developers of R5).