ipeaGIT / r5r

https://ipeagit.github.io/r5r/
Other
178 stars 27 forks source link

FR: Support for multiple opportunities #218

Closed botanize closed 2 years ago

botanize commented 2 years ago

We are often interested in access to a set of opportunities, for example, all jobs, low-, medium- and high-wage jobs, health-care, higher-education. In the past we calculated a travel-time matrix using OTP once and applied our decay function(s) to each opportunity column. With r5r we can't get the full monte-carlo travel-time-matrix to correctly calculate accessibility on our own, and the accessibility function only accepts a single opportunities column. That means if we use r5r we need to recalculate the computationally expensive part (travel-time matrix) for each opportunity of interest.

There are at least two options, with different trade-offs, and they're not mutually exclusive:

  1. Provide an option for saving the full travel-time matrix (for each minute in the time_window and monte-carlo draw) so users can post-process to their liking, including applying custom decay functions and processing multiple opportunities from a single travel time matrix. This provides the most flexibility, but also requires the most technical capability from users. It might also be the easiest to implement, though I don't know.
  2. Accept multiple opportunity columns in the destinations table, e.g., c("jobs", "pharmacies"). This would be the simplest for most users and solves the multiple opportunities problem, but not doesn't allow us to apply multiple decay functions (it's often desirable to present both cutoff and gravity measures) to the same travel-time matrix.
rafapereirabr commented 2 years ago

Hi @botanize, thank you for opening this issue. It is already possible to generate a monte-carlo travel-time-matrix using the travel_time_matrix() function with the time_window parameter. Is this what you had in mind?

dhersz commented 2 years ago

Hi @botanize, thank you for opening this issue. It is already possible to generate a monte-carlo travel-time-matrix using the travel_time_matrix() function with the time_window parameter. Is this what you had in mind?

The problem with this approach is that you'd have the accessibility based on the median (or whichever percentile you chose) travel time, which is different than the result of the accessibility function (AFAIK), which is the median accessibility in the given time window.

botanize commented 2 years ago

Yeah, exactly. travel_time_matrix returns a summary (quantiles) of the full monte-carlo matrix, so instead of 5 monte-carlo draws x 60 minute window = 300 travel times for each OD pair we get one per quantile (max of 5 quantiles). Since E(f(x)) != f(E(x)), we can't actually calculate accessibility correctly from the result of travel_time_matrix.

mvpsaraiva commented 2 years ago

Hi @botanize. Thanks for your input. I'll try to answer your suggestions here, considering the possibilities of what R5 currently offers.

  1. Provide an option for saving the full travel-time matrix (for each minute in the time_window and monte-carlo draw) so users can post-process to their liking, including applying custom decay functions and processing multiple opportunities from a single travel time matrix. This provides the most flexibility, but also requires the most technical capability from users. It might also be the easiest to implement, though I don't know.

This is not possible, as R5 discards that information once travel time percentiles are computed. Extracting that data would require some serious hacking deep inside R5's codebase. Also, the full monte-carlo matrix would be huge in memory. If you absolutely need that full matrix, you can build one manually. In your example (5 monte-carlo draws x 60 minute window = 300), you can call travel_time_matrix() 300 times changing departure_time minute by minute and calling the function 5 times for each minute. Just have in mind that if your gtfs is based on stop_times, then the 5 calls for each minute will return the exact same result, because the monte-carlo randomisation of schedules only works on frequencies-based gtfs. I just must say that I absolutely do not recommend following this route, as I see no benefits at all in going down this rabbit hole. I'm just stating what's technically possible ;)

  1. Accept multiple opportunity columns in the destinations table, e.g., c("jobs", "pharmacies"). This would be the simplest for most users and solves the multiple opportunities problem, but not doesn't allow us to apply multiple decay functions (it's often desirable to present both cutoff and gravity measures) to the same travel-time matrix.

This one is technically possible. I've checked it, and R5 allows the input of multiple opportunities datasets to a single request. But, as you said, only one decay function is allowed per request, so the accessibility() function would have to be called once per decay function. However, this feature requires significant work in r5r's Java backend, so it will have to be scheduled for some future version of the package.

--

Finally, I've talked to @dhersz about the issue of "accessibility at the median travel time" vs "median accessibility in time window", and we think we need to clarify this in the documentation. Internally, R5 first calculates the percentiles of travel times based on the full monte-carlo matrix. Only after that, it calculates the accessibility at each percentile. Thus, if you generate a travel time matrix and then manually calculate accessibility based on it, you will get the exact same result as calling the accessibility function directly. We actually made that test when we were writing the accessibility function on issue #169.

Sorry it took me a while to answer, I only saw this issue now. I hope this helps.

botanize commented 2 years ago
  1. Provide an option for saving the full travel-time matrix (for each minute in the time_window and monte-carlo draw) so users can post-process to their liking, including applying custom decay functions and processing multiple opportunities from a single travel time matrix. This provides the most flexibility, but also requires the most technical capability from users. It might also be the easiest to implement, though I don't know.

This is not possible, as R5 discards that information once travel time percentiles are computed. Extracting that data would require some serious hacking deep inside R5's codebase. Also, the full monte-carlo matrix would be huge in memory. If you absolutely need that full matrix, you can build one manually. In your example (5 monte-carlo draws x 60 minute window = 300), you can call travel_time_matrix() 300 times changing departure_time minute by minute and calling the function 5 times for each minute. Just have in mind that if your gtfs is based on stop_times, then the 5 calls for each minute will return the exact same result, because the monte-carlo randomisation of schedules only works on frequencies-based gtfs. I just must say that I absolutely do not recommend following this route, as I see no benefits at all in going down this rabbit hole. I'm just stating what's technically possible ;)

Yes, it would be huge in memory, but I don't need it in memory, writing to one file per minute and monte-carlo draw would be sufficient. But I understand that getting pre-percentile travel-time matrices from R5 is difficult, and I can live with generating each matrix myself.

  1. Accept multiple opportunity columns in the destinations table, e.g., c("jobs", "pharmacies"). This would be the simplest for most users and solves the multiple opportunities problem, but not doesn't allow us to apply multiple decay functions (it's often desirable to present both cutoff and gravity measures) to the same travel-time matrix.

This one is technically possible. I've checked it, and R5 allows the input of multiple opportunities datasets to a single request. But, as you said, only one decay function is allowed per request, so the accessibility() function would have to be called once per decay function. However, this feature requires significant work in r5r's Java backend, so it will have to be scheduled for some future version of the package.

This would be a huge help, my current work-around is to run accessibility for every combination of opportunities (up to 12!) and decay functions (usually two at most), so collapsing that to two runs would be very valuable and save a ton of time.

Finally, I've talked to @dhersz about the issue of "accessibility at the median travel time" vs "median accessibility in time window", and we think we need to clarify this in the documentation. Internally, R5 first calculates the percentiles of travel times based on the full monte-carlo matrix. Only after that, it calculates the accessibility at each percentile. Thus, if you generate a travel time matrix and then manually calculate accessibility based on it, you will get the exact same result as calling the accessibility function directly. We actually made that test when we were writing the accessibility function on issue #169.

Looks like this was brought up in R5 issue #66 as well, though that issue doesn't describe how they resolved it. OpenTripPlanner issue 2148 seems to say they solved the problem by redefining accessibility to mean the accessibility of the median travel-times. But that is not correct, and as they say, conflicts with Accessibility Observatory methods.

I created and R5 issue, but I'm basically trying to get the speed of R5, with correctness of summarizing the accessibility, not the travel-times.

abyrd commented 2 years ago

Hi everyone, I just posted an answer to @botanize's issue on R5 at https://github.com/conveyal/r5/issues/785#issuecomment-1024874122. To summarize, this is an intentional methodological choice tied to our primary use case in scenario planning, and our desire to encourage people to talk openly about the uncertainty in model outputs. We don't really think in terms of one "correct" accessibility definition, and it's probably more fruitful to talk in terms of interpretations, use cases, and the isolation of different sources of variation and uncertainty.

abyrd commented 2 years ago

This one is technically possible. I've checked it, and R5 allows the input of multiple opportunities datasets to a single request. But, as you said, only one decay function is allowed per request, so the accessibility() function would have to be called once per decay function. However, this feature requires significant work in r5r's Java backend, so it will have to be scheduled for some future version of the package.

This would be a huge help, my current work-around is to run accessibility for every combination of opportunities (up to 12!) and decay functions (usually two at most), so collapsing that to two runs would be very valuable and save a ton of time.

Correct, in a single regional analysis, R5 can do a cross-product of travel time thresholds, percentiles, and destination sets, and our analysis-ui does so by default. This allows each shortest path tree and transit-to-grid propagation (the bulk of the computation) to be reused for dozens of outputs at once.

It is on our UI roadmap to allow combining outputs for different destination layers into compound indicators (e.g. "weight of 500 for reaching at least one hospital and grocery store, plus 2x the number of jobs available in under 30 minutes") but I don't think it occurred to us to allow different decay functions per destination set in a single run. That should not be very complicated to implement, as usual the tricky part is how to specify it in our request objects without breaking backward compatibility.

Looks like this was brought up in R5 issue #66 as well, though that issue doesn't describe how they resolved it. OpenTripPlanner issue 2148 seems to say they solved the problem by redefining accessibility to mean the accessibility of the median travel-times. But that is not correct, and as they say, conflicts with Accessibility Observatory methods.

We are fans of the Accessibility Observatory work and were very happy to see OTP used to promote access concepts and hopefully raise awareness of them in policy. Their publications have certainly influenced and inspired our work. However, I would caution against taking a highly visible or large scale interpretation of a concept as the "correct" one. It is one approach with a particular focus, and there may even be pragmatic factors behind this choice: averaging instantaneous accessibility can be done by varying input parameters to an existing system (OTP).

We went to a lot of effort to rewrite the entire routing system in R5 not just to make it faster, but to intentionally make deeper methodological changes that we couldn't readily achieve by ranging over the input parameter space of OTP.

mvpsaraiva commented 2 years ago

I've added support for multiple opportunities in version 1.0, currently on the dev branch. It's implemented as suggested by @botanize, the user just needs to set the parameter opportunities_colnames = c("jobs", "pharmacies") with column names available in the destination points data.frame. Please note that the parameter name was changed from the previous opportunities_colname.

rafapereirabr commented 2 years ago

Thanks, @mvpsaraiva . Here's a reproducible example:

library(r5r)

data_path <- system.file("extdata/poa", package = "r5r")
r5r_core <- setup_r5(data_path)
points <- read.csv(file.path(data_path, "poa_hexgrid.csv"))[1:100, ]

departure_datetime <- as.POSIXct(
  "13-05-2019 14:00:00",
  format = "%d-%m-%Y %H:%M:%S")

access <- accessibility(
  r5r_core,
  origins = points,
  destinations = points,
  opportunities_colnames = c("schools", "healthcare"),
  mode = "TRANSIT",
  departure_datetime = departure_datetime,
  decay_function = "step",
  cutoffs = 30,
  max_trip_duration = 60
)

head(access)
>                 id opportunity percentile cutoff accessibility
> 1: 89a901291abffff     schools         50     30             1
> 2: 89a901291abffff  healthcare         50     30             1
> 3: 89a9012a3cfffff     schools         50     30             0
> 4: 89a9012a3cfffff  healthcare         50     30             0
> 5: 89a901295b7ffff     schools         50     30             2
> 6: 89a901295b7ffff  healthcare         50     30             3
botanize commented 2 years ago

Very exciting! Thanks for the great work!