ipeaGIT / r5r

https://ipeagit.github.io/r5r/
Other
178 stars 27 forks source link

use detailed_itineraries on Frequency-based GTFS feeds #181

Closed rafapereirabr closed 2 years ago

rafapereirabr commented 3 years ago

The fact that r5r does not run detailed_itineraries on frequency-based GTFS feeds is a limitation upstream in R5, unfortunately. We hope this will be solved soon.

In the meantime, one alternative would be to use the gtfs2gps package to generate a new stop_time.txt file and replace it in your GTFS. This example might do the trick:

library(r5r)
library(gtfs2gps)
library(data.table)

# gtfs
data_path <- system.file("extdata/spo", package = "r5r")
gtfs_file <- system.file("extdata/spo/spo.zip", package="r5r")

gtfs_list <- read_gtfs(gtfs_file)

new_stop_times <- gtfs2gps(gtfs_list, parallel = T, spatial_resolution = 500)
head(new_stop_times)

# keep only observations with stop_id
new_stop_times <- subset(new_stop_times, !is.na(stop_id))

# select columns
new_stop_times <- new_stop_times[, .(trip_id, departure_time, stop_id, stop_sequence)]
head(new_stop_times)

# update stop_times and drop frequencies.txt
gtfs_list$stop_times <- new_stop_times
gtfs_list$frequencies <- NULL

# export gtfs
gtfs2gps::write_gtfs(gtfs_list, zipfile = 'new_gtfs.zip')
cseveren commented 3 years ago

Thanks for providing this example and great work on the r5r package.

I think this example may not fully solve the upstream integration with R5, however, because it omits stop_times$arrival_time and according to the GTFS specification "[a]n arrival time must be specified for the first and the last stop in a trip." We're struggling to get this working using a frequency-defined GTFS feed.

So, I recommend the following change to the example (will try out in the next day or two to see if successful and reply back):

library(r5r)
library(gtfs2gps)
library(data.table)

# gtfs
data_path <- system.file("extdata/spo", package = "r5r")
gtfs_file <- system.file("extdata/spo/spo.zip", package="r5r")

gtfs_list <- read_gtfs(gtfs_file)

new_stop_times <- gtfs2gps(gtfs_list, parallel = T, spatial_resolution = 500)
head(new_stop_times)

# keep only observations with stop_id and replicate departure_time to arrival_time
new_stop_times <- subset(new_stop_times, !is.na(stop_id))
new_stop_times$arrival_time <- new_stop_times$departure_time

# select columns
new_stop_times <- new_stop_times[, .(trip_id, departure_time, arrival_time, stop_id, stop_sequence)]
head(new_stop_times)

# update stop_times and drop frequencies.txt
gtfs_list$stop_times <- new_stop_times
gtfs_list$frequencies <- NULL

# export gtfs
gtfs2gps::write_gtfs(gtfs_list, zipfile = 'new_gtfs.zip')

Also, I'm not exactly sure gtfs2gps works, so I'm not sure whether it would be best practice to also code stop_times$timepoint=0 and if so, whether for the first and last points in a sequence I should specify stop_times$timepoint=1.

rafapereirabr commented 3 years ago

Hi @cseveren. Thanks for the support and feedback. Yes, you're right. My suggestion above does not work correctly because the gtfs2gps package currently only operates on the departure_time. We should be able to change that soon, though.

In the meantime, I believe your suggestion should work well as a temporary fix.

rafapereirabr commented 2 years ago

One simple short-term solution to this problem would be to:

  1. generate a new stop_times table using the gtfs2gps package
  2. manually create a arrival_time column. This can be calculated by subtracting departure_time by 30 seconds, for example, the amount of time each vehicle will wait at each stop for passengers to embark/disembark.
  3. remove the frequencies.txt file and save the new GTFS.
cseveren commented 2 years ago

There's another issue with the solution proposed above that utilizes gtfs2gps; it is discussed in https://github.com/ipeaGIT/gtfs2gps/issues/213. The problem is that trip_ids as generated by gtfs2gps are not unique for trips.

Here's an alternative solution that uses gtfsrouter's frequencies_to_stop_times function and obviates the need to address the above issue. It only requires one little custom function to ensure that hms-style times can be >24:00:00 (consistent with GTFS requirements for trips crossing midnight)

library(gtfs2gps)
library(gtfsrouter)
library(dplyr)
library(r5r)

# helper function to correctly format times
hms_gt24 <- function(t){
  paste(formatC(t %/% (60*60), width = 2, format = "d", flag = "0"),
        formatC(t %% (60*60) %/% 60, width = 2, format = "d", flag = "0"),
        formatC(t %% 60, width = 2, format = "d", flag = "0"),
        sep = ":"
  )
}

# gtfs setup
data_path <- system.file("extdata/spo", package = "r5r")
gtfs_file <- system.file("extdata/spo/spo.zip", package="r5r")

# gtfs convert frequencies to stop times
gtfs_list <- extract_gtfs(gtfs_file )%>%
  frequencies_to_stop_times()

gtfs_list$frequencies <- NULL

gtfs_list$stop_times$arrival_time <- hms_gt24(gtfs_list$stop_times$arrival_time)
gtfs_list$stop_times$departure_time <- hms_gt24(gtfs_list$stop_times$departure_time)

# export gtfs
gtfs2gps::write_gtfs(gtfs_list, zipfile = 'new_gtfs.zip')
dhersz commented 2 years ago

Just to let you all know, the development version of {gtfstools} also include a frequencies_to_stop_times() function. It handles times above 24:00:00 and can also be used to convert only a few trips, instead of the entire frequencies table.

The entire workflow can be shortened to:

library(gtfstools)

frequencies_gtfs <- read_gtfs("path/to/frequencies/gtfs.zip")

stop_times_gtfs <- frequencies_to_stop_times(frequencies_gtfs)
write_gtfs(stop_times_gtfs, "path/to/stop_times/gtfs.zip")
cseveren commented 2 years ago

@dhersz This is great! One question: Is your implementation different/faster than in gtfs-router (see this issue). They recursive loop they use is quite slow because it doesn't preallocate space, so at scale each save operation is costly. I suggested a hack-y solution on this fork that also doesn't preallocate, but saves in a series of recursive steps.

I looked through your code and the use of templates seems nice, but I couldn't follow where expanded freqs/trips were being saved.

dhersz commented 2 years ago

@cseveren The implementation is quite different, as you have correctly noted, and I believe it's quite faster as well. The updated frequencies and stop times tables are saved into the gtfs object that is returned by the function. You can check that in the code from the "third step" (based on the comments) onwards.

A small benchmark of gtfsrouter's and gtfstools' functions:

path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")

gtfstools_gtfs <- gtfstools::read_gtfs(path)
gtfsrouter_gtfs <- gtfsrouter::extract_gtfs(path, quiet = TRUE)

microbenchmark::microbenchmark(
  cvt_gtfstools_gtfs <- gtfstools::frequencies_to_stop_times(gtfstools_gtfs),
  cvt_gtfsrouter_gtfs <- gtfsrouter::frequencies_to_stop_times(gtfsrouter_gtfs),
  times = 5L
)
#> Unit: milliseconds
#>                                                                           expr
#>     cvt_gtfstools_gtfs <- gtfstools::frequencies_to_stop_times(gtfstools_gtfs)
#>  cvt_gtfsrouter_gtfs <- gtfsrouter::frequencies_to_stop_times(gtfsrouter_gtfs)
#>         min        lq      mean   median         uq        max neval
#>    777.2107   793.785   875.153   925.58   937.2485   941.9406     5
#>  26631.9393 31469.455 31593.529 32363.29 32884.5130 34618.4487     5

(and apparently gtfsrouter's doesn't update the frequencies table, which seems more like a bug than an actual behaviour that should be relied upon)

cvt_gtfstools_gtfs$frequencies
#> NULL

cvt_gtfsrouter_gtfs$frequencies
#>         trip_id start_time end_time headway_secs
#>   1: CPTM L07-0      14400    17940          720
#>   2: CPTM L07-0      18000    21540          360
#>   3: CPTM L07-0      21600    25140          360
#>   4: CPTM L07-0      25200    28740          360
#>   5: CPTM L07-0      28800    32340          360
#>  ---                                            
#> 700:  5290-10-1      79200    82740         1200
#> 701:  5290-10-1      82800    86340         1200
#> 702:  6450-51-0      18000    21540         3600
#> 703:  6450-51-0      21600    25140         3600
#> 704:  6450-51-0      25200    28740         3600
cseveren commented 2 years ago

@dhersz checks out on my end, works great! thanks.

rafapereirabr commented 2 years ago

We believe the best solution here is to inform users to use the gtfstools solution above . Closing this issue for now.

rafapereirabr commented 2 months ago

could you please share a reproducible example of the error?

1022SO commented 2 months ago

Hello @rafapereirabr have revised my code and my input data, and get another error (below). I believe I will just delete my original comment right now to avoid confusion for other users. I will delete this message too, once you have seen it. I will work on debugging my code a bit more before reaching out again if need be.

"Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException"

rafapereirabr commented 2 months ago

thanks @1022SO