Closed rafapereirabr closed 2 years ago
Thanks for providing this example and great work on the r5r
package.
I think this example may not fully solve the upstream integration with R5, however, because it omits stop_times$arrival_time
and according to the GTFS specification "[a]n arrival time must be specified for the first and the last stop in a trip." We're struggling to get this working using a frequency-defined GTFS feed.
So, I recommend the following change to the example (will try out in the next day or two to see if successful and reply back):
library(r5r)
library(gtfs2gps)
library(data.table)
# gtfs
data_path <- system.file("extdata/spo", package = "r5r")
gtfs_file <- system.file("extdata/spo/spo.zip", package="r5r")
gtfs_list <- read_gtfs(gtfs_file)
new_stop_times <- gtfs2gps(gtfs_list, parallel = T, spatial_resolution = 500)
head(new_stop_times)
# keep only observations with stop_id and replicate departure_time to arrival_time
new_stop_times <- subset(new_stop_times, !is.na(stop_id))
new_stop_times$arrival_time <- new_stop_times$departure_time
# select columns
new_stop_times <- new_stop_times[, .(trip_id, departure_time, arrival_time, stop_id, stop_sequence)]
head(new_stop_times)
# update stop_times and drop frequencies.txt
gtfs_list$stop_times <- new_stop_times
gtfs_list$frequencies <- NULL
# export gtfs
gtfs2gps::write_gtfs(gtfs_list, zipfile = 'new_gtfs.zip')
Also, I'm not exactly sure gtfs2gps
works, so I'm not sure whether it would be best practice to also code stop_times$timepoint=0
and if so, whether for the first and last points in a sequence I should specify stop_times$timepoint=1
.
Hi @cseveren. Thanks for the support and feedback. Yes, you're right. My suggestion above does not work correctly because the gtfs2gps
package currently only operates on the departure_time
. We should be able to change that soon, though.
In the meantime, I believe your suggestion should work well as a temporary fix.
One simple short-term solution to this problem would be to:
stop_times
table using the gtfs2gps
packagearrival_time
column. This can be calculated by subtracting departure_time
by 30 seconds, for example, the amount of time each vehicle will wait at each stop for passengers to embark/disembark.There's another issue with the solution proposed above that utilizes gtfs2gps
; it is discussed in https://github.com/ipeaGIT/gtfs2gps/issues/213. The problem is that trip_id
s as generated by gtfs2gps
are not unique for trips.
Here's an alternative solution that uses gtfsrouter's frequencies_to_stop_times
function and obviates the need to address the above issue. It only requires one little custom function to ensure that hms
-style times can be >24:00:00 (consistent with GTFS requirements for trips crossing midnight)
library(gtfs2gps)
library(gtfsrouter)
library(dplyr)
library(r5r)
# helper function to correctly format times
hms_gt24 <- function(t){
paste(formatC(t %/% (60*60), width = 2, format = "d", flag = "0"),
formatC(t %% (60*60) %/% 60, width = 2, format = "d", flag = "0"),
formatC(t %% 60, width = 2, format = "d", flag = "0"),
sep = ":"
)
}
# gtfs setup
data_path <- system.file("extdata/spo", package = "r5r")
gtfs_file <- system.file("extdata/spo/spo.zip", package="r5r")
# gtfs convert frequencies to stop times
gtfs_list <- extract_gtfs(gtfs_file )%>%
frequencies_to_stop_times()
gtfs_list$frequencies <- NULL
gtfs_list$stop_times$arrival_time <- hms_gt24(gtfs_list$stop_times$arrival_time)
gtfs_list$stop_times$departure_time <- hms_gt24(gtfs_list$stop_times$departure_time)
# export gtfs
gtfs2gps::write_gtfs(gtfs_list, zipfile = 'new_gtfs.zip')
Just to let you all know, the development version of {gtfstools}
also include a frequencies_to_stop_times()
function. It handles times above 24:00:00 and can also be used to convert only a few trips, instead of the entire frequencies table.
The entire workflow can be shortened to:
library(gtfstools)
frequencies_gtfs <- read_gtfs("path/to/frequencies/gtfs.zip")
stop_times_gtfs <- frequencies_to_stop_times(frequencies_gtfs)
write_gtfs(stop_times_gtfs, "path/to/stop_times/gtfs.zip")
@dhersz This is great! One question: Is your implementation different/faster than in gtfs-router
(see this issue). They recursive loop they use is quite slow because it doesn't preallocate space, so at scale each save operation is costly. I suggested a hack-y solution on this fork that also doesn't preallocate, but saves in a series of recursive steps.
I looked through your code and the use of templates seems nice, but I couldn't follow where expanded freqs/trips were being saved.
@cseveren The implementation is quite different, as you have correctly noted, and I believe it's quite faster as well. The updated frequencies and stop times tables are saved into the gtfs object that is returned by the function. You can check that in the code from the "third step" (based on the comments) onwards.
A small benchmark of gtfsrouter's and gtfstools' functions:
path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
gtfstools_gtfs <- gtfstools::read_gtfs(path)
gtfsrouter_gtfs <- gtfsrouter::extract_gtfs(path, quiet = TRUE)
microbenchmark::microbenchmark(
cvt_gtfstools_gtfs <- gtfstools::frequencies_to_stop_times(gtfstools_gtfs),
cvt_gtfsrouter_gtfs <- gtfsrouter::frequencies_to_stop_times(gtfsrouter_gtfs),
times = 5L
)
#> Unit: milliseconds
#> expr
#> cvt_gtfstools_gtfs <- gtfstools::frequencies_to_stop_times(gtfstools_gtfs)
#> cvt_gtfsrouter_gtfs <- gtfsrouter::frequencies_to_stop_times(gtfsrouter_gtfs)
#> min lq mean median uq max neval
#> 777.2107 793.785 875.153 925.58 937.2485 941.9406 5
#> 26631.9393 31469.455 31593.529 32363.29 32884.5130 34618.4487 5
(and apparently gtfsrouter's doesn't update the frequencies table, which seems more like a bug than an actual behaviour that should be relied upon)
cvt_gtfstools_gtfs$frequencies
#> NULL
cvt_gtfsrouter_gtfs$frequencies
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 14400 17940 720
#> 2: CPTM L07-0 18000 21540 360
#> 3: CPTM L07-0 21600 25140 360
#> 4: CPTM L07-0 25200 28740 360
#> 5: CPTM L07-0 28800 32340 360
#> ---
#> 700: 5290-10-1 79200 82740 1200
#> 701: 5290-10-1 82800 86340 1200
#> 702: 6450-51-0 18000 21540 3600
#> 703: 6450-51-0 21600 25140 3600
#> 704: 6450-51-0 25200 28740 3600
@dhersz checks out on my end, works great! thanks.
We believe the best solution here is to inform users to use the gtfstools
solution above . Closing this issue for now.
could you please share a reproducible example of the error?
Hello @rafapereirabr have revised my code and my input data, and get another error (below). I believe I will just delete my original comment right now to avoid confusion for other users. I will delete this message too, once you have seen it. I will work on debugging my code a bit more before reaching out again if need be.
"Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException"
thanks @1022SO
The fact that
r5r
does not rundetailed_itineraries
on frequency-based GTFS feeds is a limitation upstream in R5, unfortunately. We hope this will be solved soon.In the meantime, one alternative would be to use the gtfs2gps package to generate a new
stop_time.txt
file and replace it in your GTFS. This example might do the trick: