Return travel matrices each minute over a time window

rafapereirabr commented 4 years ago

One of the advantages os R5 is how it uses searches from previous departure times to bound the path searches of other departure times later. This gives a big boost to performance and allows us to estimave travel times departing every minute within a time window. There are various advantages of this approach from a sampling perspective (see some works by Marcin Stepniak, Andrew Owen and Steve Farber). I will just reproduce below the suggestion from Matthew Conway (one of the core developers of R5).

Have you considered expanding your travel time matrix function to return all travel times over a time window (say, every minute from 8 am to 10 am)? By setting fromTime and toTime to be further than 60 seconds apart, R5 will calculate the full OD matrix for an origin to all destinations for every departure minute in that time window, a lot quicker than a naive loop would because it re-uses paths that were found a later minutes (because the slowest possible trip departing at 9:00 is by definition no slower than the trip departing at 9:01 + 1 minute, we can bound the search process and make it really fast). Details of these optimizations are in this paper: https://repository.asu.edu/items/54162

rafapereirabr commented 4 years ago

@mvpsaraiva said these two parameters are working already and can be implemented in the R routing functions.

r5r_core$setTimeWindowSize(minutes)
r5r_core$setNumberOfMonteCarloDraws(n)

@mvpsaraiva will still implement this one:

r5r_core$setPercentiles(array inteiros)

mvpsaraiva commented 4 years ago

I've implemented the setPercentiles() function in the last jar. Here is an example of its usage:

# allocate RAM memory to Java
options(java.parameters = "-Xmx2G")

# build transport network, pointing to the path where OSM and GTFS data are stored
library(r5r)
library(tidyverse)

path <- system.file("extdata", package = "r5r")
r5r_core <- setup_r5(data_path = path, verbose = FALSE)

# load origin/destination points and set arguments
points <- read.csv(system.file("extdata/poa_hexgrid.csv", package = "r5r"))
poi <- read.csv(system.file("extdata/poa_points_of_interest.csv", package = "r5r"))
mode <- c("WALK", "BUS")
max_walk_dist <- 3000   # meters
max_trip_duration <- 60 # minutes
departure_datetime <- as.POSIXct("13-03-2019 14:00:00",
                                 format = "%d-%m-%Y %H:%M:%S",
                                 tz = "America/Sao_Paulo")

# set time window and percentiles
r5r_core$setTimeWindowSize(30L) # 30 minutes
r5r_core$setNumberOfMonteCarloDraws(120L) # 4 monte carlo draws (departures) per minute
r5r_core$setPercentiles(c(25L, 50L, 75L, 100L)) 
#' use:
#' r5r_core$setPercentiles(1L:100L) 
#' to get all percentiles

# calculate a travel time matrix
ttm <- travel_time_matrix(r5r_core,
                          origins = poi,
                          destinations = points,
                          mode,
                          departure_datetime,
                          max_walk_dist,
                          max_trip_duration,
                          verbose = FALSE)

# plot results
ttm %>%
  filter(fromId == "bus_central_station") %>%
  left_join(points, by=c("toId"="id")) %>%
  pivot_longer(cols=starts_with("travel_time"), names_to = "percentile", values_to = "travel_time") %>%
  filter(travel_time <= max_trip_duration) %>%
  ggplot(aes(x=lon, y=lat, colour=travel_time)) +
  geom_point(data=points, aes(x=lon, y=lat), colour="grey85") +
  geom_point() +
  scale_colour_distiller(palette = "Spectral") +
  coord_map() +
  facet_wrap(~percentile)

rafapereirabr commented 4 years ago

I am not entirely sure how we should use the setNumberOfMonteCarloDraws parameter here. What is the defaul number used in R5? Is there any R5 documentation on this? Perhaps we could pick up on @mattwigway's brain here.

rafapereirabr commented 4 years ago

I suggest we use the following default values:

setTimeWindowSize( ) Value equal to 0. By default the function only considers a single departure time
setNumberOfMonteCarloDraws( ) Value equal to setTimeWindowSize(). By default one departure per minute.
setPercentiles( ) Value vetor equal to c(25L, 50L, 75L, 100L)

Let me know what you think.

mvpsaraiva commented 4 years ago

Just a few things to consider. The default for setTimeWindowSize( ) must be 1, because R5 doesn't work with 0 minutes. However, if the time window is 1 minute, it doesn't make sense to set different percentiles because the function will return the same value for all of them. So, maybe setPercentiles() default could be c(50), so if the user only remembers to set a larger time window but forgets about the percentiles, r5r would return the median travel time for that window. Also, all current code would have the exact same behaviour as today, after the update, which is a bonus.

Regarding how to expose those parameters, are we just adding extra parameters to the travel_time_matrix() function? I guess this would be the simplest option, and I think we don't have any new parameters to add after those in the near future.

Another thing to take note: values set using the setTimeWindowSize() and setNumberOfMonteCarloDraws() are persistent within the same r5r_core object, and they also affect the detailed_itineraries() function. Hence, if we want to keep the behaviour of detailed_itineraries() unchanged, we need to call those functions within detailed_itineraries() in order to restore default values before calling the Java functions (planSingleTrip() or planMultipleTrips()).

mattwigway commented 4 years ago

I think the default in R5 is 240 Monte Carlo draws, and I used to use 1000 for production-quality results, but those are just based on intuition. This only comes into play when you are using frequency-based routes anyhow; if you have only schedule-based routes the number of Monte Carlo draws won't affect the results, and in fact I think that if this is the case the number of Monte Carlo draws is just ignored. There more details on how the Monte Carlo results affect the results in this paper. In that paper we developed a bootstrapping algorithm to estimate the uncertainty due to the Monte Carlo process, although I don't think the bootstrapping part of R5 has been used in a while and I would honestly be surprised if it still works - but the results may be informative. You might also check with @ansoncfit about the value they are recommending for this parameter at Conveyal. In any case I wouldn't recommend setting this to 1, as that will mean that the accessibility results will be based on a single randomly-generated schedule for the frequency based routes which will likely be unstable.

The percentiles may not all be the same with a single departure time, if there are frequency based routes and there is more than one Monte Carlo draw.

I'd also argue for setting the default time window size to at least an hour—I really think that results are better when analyzed over a time window, and setting this parameter to do that by default will gently regularize this practice.

mattwigway commented 4 years ago

Also, just to correct the description of the issue, R5 uses travel times from later departures to bound the travel times at earlier departures, because a trip leaving at 10:00 cannot take longer than a trip leaving at 10:01 plus one minute - because you can always just wait at the origin and leave later.

rafapereirabr commented 4 years ago

Thank you both for your comments! I've created a time_window branch where I've implemented the parameters as follows:

setTimeWindowSize( ) Value equal to 60, a 1-hour interval
setNumberOfMonteCarloDraws( ) No input. Using R5 defautl
setPercentiles( ) Value equal to c(50L, 50L), to get the median.

Even though these are the defaul values, the function is returning and error when the user does not pass these parameters explicitly (see reprex below).

library(r5r)

# build transport network
data_path <- system.file("extdata", package = "r5r")
r5r_core <- setup_r5(data_path = data_path)

# load origin/destination points
points <- read.csv(system.file("extdata/poa_hexgrid.csv", package = "r5r"))[1:5,]

mode <- c("WALK", "TRANSIT")
max_walk_dist <- Inf
max_trip_duration <- 120L
departure_datetime <- as.POSIXct("13-03-2019 14:00:00",
                                format = "%d-%m-%Y %H:%M:%S")

# estimate travel time matrix
travel_time_matrix(r5r_core,
                         origins = points,
                         destinations = points,
                         mode,
                         departure_datetime,
                         max_walk_dist,
                         max_trip_duration
                          # , time_window= 60
                          # , percentiles =  c(50, 50)
                         )

mvpsaraiva commented 4 years ago

The problem is that the order of the parameters has changed. So, effectively, you're passing the value of max_walk_dist to time_window, and max_trip_duration to percentiles. Apart from that, there is a bug in the Java code that doesn't accept a single percentile as input, and I guess that's why you've set c(50, 50) as the default, instead of just 50. I'll fix this in the next version of the jar.

rafapereirabr commented 4 years ago

The function works if I specify the other parameters like this:

# estimate travel time matrix
travel_time_matrix(r5r_core,
                   origins = points,
                   destinations = points,
                   mode = mode,
                   departure_datetime = departure_datetime,
                   max_walk_dist = max_walk_dist,
                   max_trip_duration = max_trip_duration
                   )

You are right about the c(50, 50). I will wait for you to update the JAR file to do a pull request.

ansoncfit commented 4 years ago

See the guidance on number of draws here: https://docs.conveyal.com/analysis/configuration#simulated-schedules

We have a paper under review that characterizes the number of draws and convergence, I'll hope to share it soon.

rafapereirabr commented 4 years ago

Thanks for the link @ansoncfit . This is a good reference to point to in our documentation. Please let us know when you have the paper published so we can cite it in our documentation as well.

rafapereirabr commented 4 years ago

I've now added new parameters to the travel_time_matrx fuction to allow for multiple estimates over a time window set by the user. See pull request #117.

For now, the number of Monte Carlo simulations is 200, and the user has no control over it. We could however add this as a parameter for users to change, it would only add a bit more complexity for the end users who, in general, are not familiar with this. This is my concern.

rafapereirabr commented 4 years ago

Following the discusson on issue #118 , I seeting the number of Monte Carlo Draws to 5 times the size of time_window. A one-minute time window would use 5 draws, while a 60-minute window would use 300 draws and so on.

rafapereirabr commented 4 years ago

This capability has been implemented in the new version of r5r v0.2.0 published on CRAN today. closing this issue for now.

ipeaGIT / r5r

Return travel matrices each minute over a time window #104