ipeaGIT / gtfs2gps

Convert GTFS data into a data.table with GPS-like records in R
https://ipeagit.github.io/gtfs2gps/
Other
71 stars 10 forks source link

`gtfs2gps` replicates `trip_id`s across many trips in `$stop_times` #213

Closed cseveren closed 3 years ago

cseveren commented 3 years ago

I think that trip_ids should be unique within $trips, such that pairs of (trip_id,stop_sequence) are unique within $stop_times. However, when using gtfs2gps to convert a frequency-based GTFS feed to non-frequency-base (as in https://github.com/ipeaGIT/r5r/issues/181), gtfs2gps does expand trip_ids and to match the appropriate number of frequency-delineated trips, and thus there are many repeated trips (repeated pairs of (trip_id,stop_sequence)) in $stop_times. This causes errors in feed validation using google/transitfeed, which indicates Timetravel detected!.

A loose guess on my end is that new trip_ids need to be created, one for each trip, before this is combined with the frequencies in gtfs2gps's conversion, but I'm not sure.

cseveren commented 3 years ago

Here's a reproducible example:

  1. Download the 17 March 2020 GTFS feed for Mexico City
  2. Place the code below in a script located in the same location as the feed above.
  3. Run the script
library(gtfs2gps)
library(dplyr)
library(data.table)

# read in feed and translate
gtfs_list <- read_gtfs("./gtfs.zip")
new_stop_times <- gtfs2gps(gtfs_list, parallel = T, spatial_resolution = 500)

# keep only observations with stop_id and replicate departure_time to arrival_time
new_stop_times <- subset(new_stop_times, !is.na(stop_id))
new_stop_times$arrival_time <- as.ITime(new_stop_times$departure_time)

# select columns
new_stop_times <- new_stop_times[, .(trip_id, departure_time, arrival_time, stop_id, stop_sequence)]
head(new_stop_times)

# update stop_times and drop frequencies.txt
gtfs_list$stop_times <- new_stop_times
gtfs_list$frequencies <- NULL

# selecting a particular non-missing trip_id at random
tt = gtfs_list$stop_times %>%
  filter(trip_id==28992)

The object tt created above has 4320 rows, corresponding to 180 instances of the trip each 24 stops long starting every 2:20 from 10am to about 5:35pm.

cseveren commented 3 years ago

Prior discussions of this topic for OTP that may be useful: https://github.com/opentripplanner/OpenTripPlanner/issues/1347

Moreover, there may already be a useful tool here: https://atfutures.github.io/gtfs-router/reference/frequencies_to_stop_times.html

rafapereirabr commented 3 years ago

The function frequencies_to_stop_times{gtfsrouter} might be a useful reference here.

Joaobazzo commented 3 years ago

thanks for the issue @cseveren . In fact, the trip_id does not change in the gtfs2gps::gtfs2gps processing. That happens because there is no previous information of trip_id's, as in simple (non-freq) GTFS formats. We're fixing this by associating two columns (trip_id, trip_number) to create a unique trip_id. See the example of Mexico City

>   gps_data <- read_gtfs("gtfs.zip") %>% 
+     filter_by_shape_id("14816") %>% 
+     gtfs2gps()

>   tmp_gps <- data.table::copy(gps_data)[!is.na(stop_id) & trip_id == "28992"]
>   # number of 'trip_number'
>   length(unique(tmp_gps$trip_number))
[1] 180
>   # number of 'trip_id'
>   length(unique(tmp_gps$trip_id))
[1] 1
>   # adjustment
>   tmp_gps[,trip_id := paste0(trip_id,"#",trip_number)]
>   length(unique(tmp_gps$trip_id))
[1] 180
>   # new stop_times
>   tmp_gps[, arrival_time := departure_time]
>   new_stop_times <- tmp_gps[, .(trip_id, departure_time, arrival_time, stop_id, stop_sequence)]
>   head(new_stop_times)
   trip_id departure_time arrival_time stop_id stop_sequence
1: 28992#1       10:00:00     10:00:00   14090             1
2: 28992#1       10:02:13     10:02:13   14089             2
3: 28992#1       10:03:48     10:03:48   14086             3
4: 28992#1       10:05:18     10:05:18   14085             4
5: 28992#1       10:06:56     10:06:56   14093             5
6: 28992#1       10:08:26     10:08:26   14092             6
rafapereirabr commented 3 years ago

Hi @cseveren , we believe this issue has been solved with the PR #214. We're closing this issue for now, but please don't hesitate to reopen it if you think the problem persists on your case.