more output from travel_time_matrix()

rafapereirabr commented 3 years ago

@mvpsaraiva , does R5 allow us to get more information from travel_time_matrix(), such as total distance? Or in the case of walking + transit, the walking distance?

mvpsaraiva commented 3 years ago

Yes, we can have lots of extra path information since R5 6.2.

Taking an example from here:

origin	destination	routes	boardStops	alightStops	rideTimes	accessTime	egressTime	transferTime	waitTimes	totalTime	nIterations
1	2	70\|553	1452\|86944	86944\|88339	16.0\|8.0	3.1	13.6	0	1.9\|10.0	52.6	3
1	2	70	1452	8846	30	3.1	2.6	0	1.9	37.6	57
1	3	70\|61	1452\|86944	88333\|7783	16.0\|17.0	3.1	17	0.6	1.9\|8.4	64	14
1	3	70\|61	1452\|86944	86944\|7783	16.0\|17.0	3.1	17	9.6	1.9\|8.4	72.9	46

The | character separates information in the same column. So, in the first line, the rider took the route 70 on stop 1452 and got off the bus on stop 86944, then they got on route 553 on stop 86944 to stop 88339. We also have access, egress, wait, ride, and transfer times for each leg.

The question is: how much of this information we want to pass along to r5r, and how.

rafapereirabr commented 3 years ago

This is brilliant. Now to the question "how much of this information we want to pass along to r5r, and how." Here are my two cents:

I don't think it would be necessary to output info on routes, boardStops and alightStops. The user could get this from the detailed_itineraries() function.
I think it would be great to keep the columns totalTime, egressTime, transferTime, waitTimes. This would make the output much richer at a very low computational cost. I'm just not sure how these columns could be presented when a user sets more than one percentile, for example percentiles = c(20, 50, 80)

ps. What does the nIterations column mean?

mvpsaraiva commented 3 years ago

Great! I agree that breaking down travel times into segments is quite useful, and requires much less work than breaking down the whole itinerary. Even more so considering that stop_ids alone are not very useful, and we probably would have to fetch their lat lon coordinates as well.

About the percentiles, those travel time segments do not interact directly with them. Basically, we can choose to get the average of those statistics in the time window, or the minimum (the fastest trip in the time window).

ps. What does the nIterations column mean?

From the documentation:

nIterations: number of departure minutes in the departure time window at which this path is optimal.

The optimal path between any pair of OD varies during the time window due to transit schedules, so nIterations indicates how many times that particular itinerary resulted in the shortest travel time.

We have a caveat, though: we can only get detailed path information for up to 5000 destination points. This limit is hardcoded in R5 in this line. Perhaps that's something we can discuss with Conveyal as well.

cseveren commented 2 years ago

For what it's worth, an expanded travel_time_matrix that reports access and egress times would fit exactly a use case that has pushed me to use detailed_itineraries; I don't need all the detail that the latter command returns, but want to know (e.g.,) how much walking is required for a given fastest trip.

rafapereirabr commented 2 years ago

I'm glad to hear you'll find this useful as well @cseveren .

Great! I agree that breaking down travel times into segments is quite useful, and requires much less work than breaking down the whole itinerary. Even more so considering that stop_ids alone are not very useful, and we probably would have to fetch their lat lon coordinates as well.

About the percentiles, those travel time segments do not interact directly with them. Basically, we can choose to get the average of those statistics in the time window, or the minimum (the fastest trip in the time window).

ps. What does the nIterations column mean?

From the documentation:

nIterations: number of departure minutes in the departure time window at which this path is optimal.

The optimal path between any pair of OD varies during the time window due to transit schedules, so nIterations indicates how many times that particular itinerary resulted in the shortest travel time.

We have a caveat, though: we can only get detailed path information for up to 5000 destination points. This limit is hardcoded in R5 in this line. Perhaps that's something we can discuss with Conveyal as well.

Thanks for the clarification, @mvpsaraiva. Regarding the this hardcoded limit upstream in R5, this should preferably be overwritten by the Java side or r5r, but I'm not sure that's possible. Is it? If that's not possible, we should check with @ansoncfit and the Conveyal team whether this hardcoded limit could be changed upstream in R5.

mvpsaraiva commented 2 years ago

It would be quite easy to change that limit in our own R5 jar, but I've just created an issue in Conveyal's repository to suggest a change upstream.

rafapereirabr commented 2 years ago

Documentation suggestion:

#' @param breakdown logic. If `FALSE` (default), the function returns a simple 
#'                  output with columns origin, destination and travel time 
#'                  percentiles. If `TRUE`, r5r breaks down the trip information
#'                  and returns more columns with estimates of `access_time`,
#'                  `waiting_time`, `ride_time`, `transfer_time`, `total_time` , `n_rides`
#'                  and `route`. Warning: Setting `TRUE` makes the function
#'                  significantly slower.
#'   
#' @param breakdown_stat string. If `min`, all the brokendown trip informantion 
#'        is based on the trip itinerary with the smallest waiting time in the 
#'        time window. If `breakdown_stat = mean`, the information is based on 
#'        the trip itinerary whose waiting time is the closest to the average 
#'        waiting time in the time window.

rafapereirabr commented 2 years ago

@mvpsaraiva , I think we are ready to merge the dev into the master branch, Right?

mvpsaraiva commented 2 years ago

@mvpsaraiva , I think we are ready to merge the dev into the master branch, Right?

Agreed!

CWen001 commented 2 years ago

Hello, @rafapereirabr, @mvpsaraiva. Is it still planned to add the information by distance (total distance; walking distance to/from the transit)?

For now, I'm wondering if there are some best practices for users to calculate distances from the output of travel time? I saw in the documentation that the default average walking speed is 3.6 Km/h and cycling speed 12 Km/h. But for bus and other transit, the converting via speed might be not straightforward to estimate the total distance.

Thank you very much for this powerful package.

rafapereirabr commented 2 years ago

Hi @CWen001 . I'm not entirely sure it's possible to extract trip distance information from R5, but @mvpsaraiva will be able to confirm that.

In any case, it can be tricky to get distance info for public transport trips. This is because trip distance info depends on the shapes.txt file in the GTFS input, and many GTFS feeds do not have that file.

mvpsaraiva commented 2 years ago

Hi @CWen001. The only way you can get information on travel distances is with detailed_itineraries(). The outputs that R5 provides for travel_time_matrix() don't include that information, so we can't pass it along to r5r users. Conveyal would need to update R5 to compute that information, but I'm quite sure this is not in their plans (for many technical and practical reasons).

Calculating walking and cycling distances from times is relatively straightforward, but it's not 100% accurate. You wouldn't be considering topography, for example, or turn restrictions. I also believe R5 may add a small penalty to walking times when pedestrians need to cross busy/large streets.

Travel distances by public transport is even more complicated and, as @rafapereirabr said, impossible in many situations.

CWen001 commented 2 years ago

Thank you very much for your replies, @rafapereirabr, @mvpsaraiva. Now I understood the complexities under the hood.

One use case for our urban planning department is to calculate a large travel time matrix. But somehow the (default settings of) time results from r5r is about half the time compared to what we sampled to test using the same origins and dests by google services. As local people are very familiar with and sensitive to the travel time in the city, they might question the results. Since the distance is unavailable, I think the way we should go is to carefully study/test each parameter in the function travel_time_matrix() for tuning the time results.

Still, a lot of respect for the wonderful open-source package and the developers.

rafapereirabr commented 2 years ago

@CWen001 , are you using the exact same GTFS feed in r5r and in google services? I'm curious about the root cause of this large difference, but we need to make sure we're comparing the same gtfs input.

mvpsaraiva commented 2 years ago

I agree with @rafapereirabr that such a large difference (half the time!) is much likely to be due to differences between the GTFS feeds.

One experiment you can try is to set the parameters time_window = 60 and percentiles = c(5, 25, 50, 75, 95). Then you can see if any of those travel time percentiles are close to Google's estimates.

ipeaGIT / r5r

more output from travel_time_matrix() #194