Open wklumpen opened 2 years ago
We'll need to think carefully about this conversion, as it requires a bit of decision making. In particular:
Hey @peterlai1, I wanted to check if there was any work that you ended up doing on this as a part of getting the GO system up and running.
Just before we have other contributions (from Omar) go too far down the path.
Hey @peterlai1, I wanted to check if there was any work that you ended up doing on this as a part of getting the GO system up and running.
Just before we have other contributions (from Omar) go too far down the path.
The GO Train network model was built using ATLS data from Metrolinx instead of GTFS, so there wasn't any work done on this front. I did use GTFS as part of the creation process for the animations, though that's probably a separate feature discussion.
Thank you both for setting all this up! These are my thoughts on how we can translate GTFS into spur input
service_id
from the GTFS dataset in calendar.txt
, which would correspond to a typical day. Ex: Focus the analysis on a weekday only. u
and v
) can be obtained by parsing stop_times.txt
, since we can get the order of stations in trips (ordered by stop_sequence
). If the script picks up something like A->B->C and C->B->A (A, B, C are stations), then we can assume 2 track operations (?). One track for each direction, so we can assign key 1 to one direction and key 2 to the other direction - this is definitely a big simplification but it's probably a good start. traversal_time
which would be the difference of departure_time
at the former stop and arrival_time
at the latter stop.shapes.txt
, the station coordinates are of no use since they give us the distance as the crow flies.stop_times.txt
can be used to identify the routes (those would be trip_id
in GTFS). stop_times.txt
and identifying the repeated routes (in spur lingo) and trips (in GTFS lingo), we can create a set of tours and their corresponding start times.routes.txt
(link) and only pick up the ones that correspond to rail/subway - that will minimize a lot of noise in datasets that combine rail and bus in one GTFS dataset (such as the TTC)That was a lot of text, I can try to come up with a prototype - hopefully that would be a bit more digestible.
Hi Omar, thanks for looking into this!
A few thoughts:
routes.txt
for routes, and then trips.txt
as a way of stitching tours together. You can use the logic that when a trip ends, and another trip begins after that from the termius station, the vehicle just heads back the other way. Might be a way of simplifying the overall number of required agents down.Regarding tours, the block_id
field in trips.txt
is perfect for that! Each block_id
is associated with a set of trips and they are meant to be operated continuously back to back, often by the same vehicle (not necessarily going back in the opposite direction on the same line, could be operating a different line in the case of interlining). This was actually my original reason for creating the tours entity, as a way to replicate blocks in the GTFS to cut down on redundancy. While different agencies encode their block info slightly differently in GTFS, for the first take of this I'd say doing a one-to-one adaptation of blocks in GTFS as tours in Spur will be adequate.
Update:
So I developed something to extract TimedTrack
and SimpleStation
components - mainly using stop_times.txt
since the order of stations is there.
If trips go from station A to B and from B to A, then we're assuming double tracks, so I used key
to differentiate them (with values 1 or 2 to make the distinction).
Also, I used the average scheduled time from A to B (difference of arrival time at B and departure time at A) to estimate the value of traversal_time
.
Formatting still needs some work, but the idea is there. I pasted below sample input/output, if you want to take a look - feel free to let me know what you think.
Will continue working on the other components.
That works well Omar - that's exactly what I ended up doing as a one-off for Line 4 Subway in Toronto.
Just to make sure I understood the concept of tours My understanding is that in the example below (from the sample TTC Line 4), there's a train doing Tour-1971266, and this train does the following:
departure
indicates the departure times at each of these stationsdeparture
indicates the departure times at each of these stationsDid I get that right?
[
{
"name": "Tour-1971226",
"creation_time": 0,
"deletion_time": 86400,
"routes": [
{
"name": "R-Westbound",
"args": [
{
"departure": 20850
},
null,
{
"departure": 20975
},
null,
{
"departure": 21080
},
null,
{
"departure": 21193
},
null,
{
"departure": 21354
}
]
},
{
"name": "R-Eastbound",
"args": [
{
"departure": 21450
},
null,
{
"departure": 21633
},
null,
{
"departure": 21720
},
null,
{
"departure": 21795
},
null,
{
"departure": 21954
}
]
},
{
"name": "R-Westbound",
"args": [
{
"departure": 22170
},
null,
{
"departure": 22295
},
null,
{
"departure": 22400
},
null,
{
"departure": 22513
},
null,
{
"departure": 22674
}
]
},
Hi Omar, yes that is correct. In each traversal of a route within a tour, each item in the args list correspond one-to-one to each component listed in the route in order. In this example, the departure
time arguments apply to the station components, and no args (null
) are applied to the track components between stations.
Thanks @peterlai1
Update: I think the logic works - but the formatting needs a bit more work
I also need to add a few more things such as ignore non-train routes in GTFS and at the end replace all stop_id
values with actual stop names to make things more human readable
But also, do you have any thoughts regarding fields that require user-input (ex: which "station" corresponds to the yard, and the capacities of the yards, stations, and tracks). I am thinking of setting these as default values for now (so the first component of a route is always a yard, track capacity = 10, and mean boarding/alighting time = 20)
Also, speaking of defaults: I set the tour creation time to zero and deletion time to 2 days (some GTFS datasets go a few hours over 24h but I haven't seen anything go beyond that since it's typically describing the transit schedule for a day)
Please let me know if you had other thoughts/direction on all this
I pasted a sneak peak of the input/output below if you want to take a look
Thanks Omar for your great work so far! I like what you have proposed as the default parameter values. For now we can just use that and maybe have some primitive way for users to customize these constants (later on we can think about ability to customize component types and associated params). The deletion time of 2 days works, I don't think I've yet come across a GTFS feed that has a service day longer than 30 hours, so even that might be sufficient as well, though 2 days is definitely safer.
A couple of things regarding the input/outputs:
routes.json
, trip1
and trip3
traverse through the exact same set of components in the exact same order, so they are duplicates and we should actually only keep one of them. Let's say we keep trip1
, then in tour 200, it should use trip1
as the route
trip1
is scheduled to arrive at Bessarion later than when the next trip (trip2
) is scheduled to leave Bessarion even though the trips are meant to be connected via the tourI see I see Okay I updated the script to fix that
routes.json
now only shows trip1
and trip2
tours.json
now shows trip1
and trip2
under the 100
tour and trip1
under the 200
tourFor your second point, yeah I am averaging the difference of arrival time at next station and departure time at current station for the station pairs for all trips (you can double check the numbers here https://github.com/spur-sim/spur/issues/27#issuecomment-2016908486 and see if that's what you had in mind)
For your third point - that one's on me - I think I brainlessly entered these times, but you're right, the first departure time for trip2 should be after the last arrival time for trip1!
Will keep working on this next week and will keep you posted
Check out PR https://github.com/spur-sim/spur/pull/71 !
So - I think there's a few things that make GTFS especially tricky. One of them is bundling into tours, as tours require that the last component of tour n is the first component of tour n+1. To do this properly you'd have to make some assumptions about layovers, etc.
First, I think @peterlai1 is right about the block IDs being useful for tours, but they are not required for basic GTFS. So with that in mind I'm going to suggest a very simple approach that makes one big simplification: Each trip is run by a distinct vehicle. That is, each trip on a route is its own tour.
My thinking of the logic is this: Build the graph first, using route.txt
to identify rail routes (allow folks to filter as needed), then identify all trips.txt
using that route, and then use stop_times.txt
to build the components network assuming a bi-drecitonal graph. If a stop_id_1
to stop_id_2
link exists already it won't get duplicated. Once the graph is built we can dump the components back into JSON just by iterating through the edges.
Then, we simply create a tour for each trip in trips.txt
, create a vehicle in trains.json
that is assigned to it, and off we go.
Then, if we want to do something more compact and realistic with block IDs we can do that to as a second feature.
I'm going to try and move the logic @omar-kabbani has already wonderfully put together over into this method, and I will probably just incorporate it into a function. I'm thinking it's time for a spur.data
module to sit alongside the spur.core
module.
Thanks again for everyone's help! I think once we have a basic GTFS converter we'll be able to very rapidly prototype a lot of things, even if there are some simplifications.
Would be good to have an ability to "read" a GTFS feed into a set of useable components based on some specified pre-determined parameters and settings.
The functionality would take one or more GTFS zipfiles and generate
components.json
,routes.json
, andtrains.json
based on a fairly simple set of criteria. Users could then make modification to components to more easily set up the network.