spur-sim / spur

Simulation for Planning and Understanding Railways
MIT License
9 stars 4 forks source link

GTFS to Component and Train Conversion #27

Open wklumpen opened 2 years ago

wklumpen commented 2 years ago

Would be good to have an ability to "read" a GTFS feed into a set of useable components based on some specified pre-determined parameters and settings.

The functionality would take one or more GTFS zipfiles and generate components.json, routes.json, and trains.json based on a fairly simple set of criteria. Users could then make modification to components to more easily set up the network.

wklumpen commented 1 year ago

We'll need to think carefully about this conversion, as it requires a bit of decision making. In particular:

wklumpen commented 10 months ago

Hey @peterlai1, I wanted to check if there was any work that you ended up doing on this as a part of getting the GO system up and running.

Just before we have other contributions (from Omar) go too far down the path.

peterlai1 commented 10 months ago

Hey @peterlai1, I wanted to check if there was any work that you ended up doing on this as a part of getting the GO system up and running.

Just before we have other contributions (from Omar) go too far down the path.

The GO Train network model was built using ATLS data from Metrolinx instead of GTFS, so there wasn't any work done on this front. I did use GTFS as part of the creation process for the animations, though that's probably a separate feature discussion.

omar-kabbani commented 10 months ago

Thank you both for setting all this up! These are my thoughts on how we can translate GTFS into spur input

That was a lot of text, I can try to come up with a prototype - hopefully that would be a bit more digestible.

wklumpen commented 10 months ago

Hi Omar, thanks for looking into this!

A few thoughts:

peterlai1 commented 10 months ago

Regarding tours, the block_id field in trips.txt is perfect for that! Each block_id is associated with a set of trips and they are meant to be operated continuously back to back, often by the same vehicle (not necessarily going back in the opposite direction on the same line, could be operating a different line in the case of interlining). This was actually my original reason for creating the tours entity, as a way to replicate blocks in the GTFS to cut down on redundancy. While different agencies encode their block info slightly differently in GTFS, for the first take of this I'd say doing a one-to-one adaptation of blocks in GTFS as tours in Spur will be adequate.

omar-kabbani commented 9 months ago

Update: So I developed something to extract TimedTrack and SimpleStation components - mainly using stop_times.txt since the order of stations is there. If trips go from station A to B and from B to A, then we're assuming double tracks, so I used key to differentiate them (with values 1 or 2 to make the distinction). Also, I used the average scheduled time from A to B (difference of arrival time at B and departure time at A) to estimate the value of traversal_time. Formatting still needs some work, but the idea is there. I pasted below sample input/output, if you want to take a look - feel free to let me know what you think.

Will continue working on the other components.

Sample Input (stop_times.txt) ```javascript trip_id,arrival_time,departure_time,stop_id,stop_sequence trip1,07:00:00,07:00:00,yonge,1 trip1,07:05:00,07:05:00,bayview,2 trip1,07:20:00,07:20:00,bessarion,3 trip2,07:15:00,07:15:00,bessarion,1 trip2,07:20:00,07:20:00,bayview,2 trip2,07:30:00,07:30:00,yonge,3 trip3,07:00:00,07:00:00,yonge,1 trip3,07:10:00,07:10:00,bayview,2 trip3,07:30:00,07:30:00,bessarion,3 ```
Sample Output ```javascript { "station0": { "type": "SimpleStation", "name": "yonge_1", "u": "yonge_1", "v": "yonge_2", "key": 1 }, "station2": { "type": "SimpleStation", "name": "bayview_2", "u": "bayview_1", "v": "bayview_2", "key": 2 }, "station1": { "type": "SimpleStation", "name": "bayview_1", "u": "bayview_1", "v": "bayview_2", "key": 1 }, "station3": { "type": "SimpleStation", "name": "bessarion_2", "u": "bessarion_1", "v": "bessarion_2", "key": 2 }, "stationbessarion_1": { "type": "SimpleStation", "name": "bessarion_1", "u": "bessarion_1", "v": "bessarion_2", "key": 1 }, "stationyonge_2": { "type": "SimpleStation", "name": "yonge_2", "u": "yonge_1", "v": "yonge_2", "key": 2 } }{ "edge0": { "type": "TimedTrack", "u": "yonge_2", "v": "bayview_1", "key": 1, "traversal_time": 450.0 }, "edge2": { "type": "TimedTrack", "u": "yonge_2", "v": "bayview_1", "key": 2, "traversal_time": 600.0 }, "edge1": { "type": "TimedTrack", "u": "bayview_2", "v": "bessarion_1", "key": 1, "traversal_time": 1050.0 }, "edge3": { "type": "TimedTrack", "u": "bayview_2", "v": "bessarion_1", "key": 2, "traversal_time": 300.0 } } ```
wklumpen commented 9 months ago

That works well Omar - that's exactly what I ended up doing as a one-off for Line 4 Subway in Toronto.

omar-kabbani commented 8 months ago

Just to make sure I understood the concept of tours My understanding is that in the example below (from the sample TTC Line 4), there's a train doing Tour-1971266, and this train does the following:

  1. Heads westbound and makes 5 stops (Don Mills, Leslie, Bessarion, Bayview, and Yonge) - and departure indicates the departure times at each of these stations
  2. Then heads eastbound and makes these 5 stops (Yonge, Bayview, Bessarion, Leslie, and Don Mills) - and departure indicates the departure times at each of these stations
  3. Heads westbound again (same as step 1)

Did I get that right?

[
  {
    "name": "Tour-1971226",
    "creation_time": 0,
    "deletion_time": 86400,
    "routes": [
      {
        "name": "R-Westbound",
        "args": [
          {
            "departure": 20850
          },
          null,
          {
            "departure": 20975
          },
          null,
          {
            "departure": 21080
          },
          null,
          {
            "departure": 21193
          },
          null,
          {
            "departure": 21354
          }
        ]
      },
      {
        "name": "R-Eastbound",
        "args": [
          {
            "departure": 21450
          },
          null,
          {
            "departure": 21633
          },
          null,
          {
            "departure": 21720
          },
          null,
          {
            "departure": 21795
          },
          null,
          {
            "departure": 21954
          }
        ]
      },
      {
        "name": "R-Westbound",
        "args": [
          {
            "departure": 22170
          },
          null,
          {
            "departure": 22295
          },
          null,
          {
            "departure": 22400
          },
          null,
          {
            "departure": 22513
          },
          null,
          {
            "departure": 22674
          }
        ]
      },
peterlai1 commented 8 months ago

Hi Omar, yes that is correct. In each traversal of a route within a tour, each item in the args list correspond one-to-one to each component listed in the route in order. In this example, the departure time arguments apply to the station components, and no args (null) are applied to the track components between stations.

omar-kabbani commented 8 months ago

Thanks @peterlai1

Update: I think the logic works - but the formatting needs a bit more work

I also need to add a few more things such as ignore non-train routes in GTFS and at the end replace all stop_id values with actual stop names to make things more human readable

But also, do you have any thoughts regarding fields that require user-input (ex: which "station" corresponds to the yard, and the capacities of the yards, stations, and tracks). I am thinking of setting these as default values for now (so the first component of a route is always a yard, track capacity = 10, and mean boarding/alighting time = 20)

Also, speaking of defaults: I set the tour creation time to zero and deletion time to 2 days (some GTFS datasets go a few hours over 24h but I haven't seen anything go beyond that since it's typically describing the transit schedule for a day)

Please let me know if you had other thoughts/direction on all this

I pasted a sneak peak of the input/output below if you want to take a look

Sample Input `trips.txt` (Ignore that this is a bus route for now) ```javascript route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,bikes_allowed A,1,trip1,EAST - 10 VAN HORNE towards VICTORIA PARK,,0,100,998007,1,1 A,1,trip2,EAST - 10 VAN HORNE towards VICTORIA PARK,,1,100,998007,1,1 A,1,trip3,EAST - 10 VAN HORNE towards VICTORIA PARK,,0,200,998007,1,1; ``` `stop_times.txt` ```javascript trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled trip1,07:00:00,07:00:00,yonge,1 trip1,07:05:00,07:05:00,bayview,2 trip1,07:20:00,07:20:00,bessarion,3 trip2,07:15:00,07:15:00,bessarion,1 trip2,07:20:00,07:20:00,bayview,2 trip2,07:30:00,07:30:00,yonge,3 trip3,07:00:00,07:00:00,yonge,1 trip3,07:10:00,07:10:00,bayview,2 trip3,07:30:00,07:30:00,bessarion,3 ```
Sample Output `components.json` ```javascript [ { "type": "SimpleStation", "u": "yonge_1", "v": "yonge_2", "key": 1 }, { "type": "SimpleStation", "u": "bayview_1", "v": "bayview_2", "key": 1 }, { "type": "SimpleStation", "u": "bessarion_1", "v": "bessarion_2", "key": 1 }, { "type": "SimpleStation", "u": "bessarion_1", "v": "bessarion_2", "key": 2 }, { "type": "SimpleStation", "u": "bayview_1", "v": "bayview_2", "key": 2 }, { "type": "SimpleStation", "u": "yonge_1", "v": "yonge_2", "key": 2 }, { "type": "TimedTrack", "u": "yonge_2", "v": "bayview_1", "key": 1, "traversal_time": 450.0 }, { "type": "TimedTrack", "u": "bayview_2", "v": "bessarion_1", "key": 1, "traversal_time": 1050.0 }, { "type": "TimedTrack", "u": "bessarion_2", "v": "bayview_1", "key": 2, "traversal_time": 300.0 }, { "type": "TimedTrack", "u": "bayview_2", "v": "yonge_1", "key": 2, "traversal_time": 600.0 } ] ``` `routes.json` ```javascript [ { "name": "trip1", "components": [ { "u": "yonge_1", "v": "yonge_2", "key": 1 }, { "u": "yonge_2", "v": "bayview_1", "key": 1 }, { "u": "bayview_1", "v": "bayview_2", "key": 1 }, { "u": "bayview_2", "v": "bessarion_1", "key": 1 }, { "u": "bessarion_1", "v": "bessarion_2", "key": 1 } ] }, { "name": "trip2", "components": [ { "u": "bessarion_2", "v": "bessarion_1", "key": 2 }, { "u": "bessarion_1", "v": "bayview_2", "key": 2 }, { "u": "bayview_2", "v": "bayview_1", "key": 2 }, { "u": "bayview_1", "v": "yonge_2", "key": 2 }, { "u": "yonge_2", "v": "yonge_1", "key": 2 } ] }, { "name": "trip3", "components": [ { "u": "yonge_1", "v": "yonge_2", "key": 1 }, { "u": "yonge_2", "v": "bayview_1", "key": 1 }, { "u": "bayview_1", "v": "bayview_2", "key": 1 }, { "u": "bayview_2", "v": "bessarion_1", "key": 1 }, { "u": "bessarion_1", "v": "bessarion_2", "key": 1 } ] } ] ``` `tours.json` ```javascript [ { "name": "100", "creation_time": 0, "deletion_time": 172800, "routes": [ { "name": "trip1", "args": [ { "departure": 25200 }, null, { "departure": 25500 }, null, { "departure": 26400 } ] }, { "name": "trip2", "args": [ { "departure": 26100 }, null, { "departure": 26400 }, null, { "departure": 27000 } ] } ] }, { "name": "200", "creation_time": 0, "deletion_time": 172800, "routes": [ { "name": "trip3", "args": [ { "departure": 25200 }, null, { "departure": 25800 }, null, { "departure": 27000 } ] } ] } ] ```
peterlai1 commented 8 months ago

Thanks Omar for your great work so far! I like what you have proposed as the default parameter values. For now we can just use that and maybe have some primitive way for users to customize these constants (later on we can think about ability to customize component types and associated params). The deletion time of 2 days works, I don't think I've yet come across a GTFS feed that has a service day longer than 30 hours, so even that might be sufficient as well, though 2 days is definitely safer.

A couple of things regarding the input/outputs:

omar-kabbani commented 8 months ago

I see I see Okay I updated the script to fix that

For your second point, yeah I am averaging the difference of arrival time at next station and departure time at current station for the station pairs for all trips (you can double check the numbers here https://github.com/spur-sim/spur/issues/27#issuecomment-2016908486 and see if that's what you had in mind)

For your third point - that one's on me - I think I brainlessly entered these times, but you're right, the first departure time for trip2 should be after the last arrival time for trip1!

Will keep working on this next week and will keep you posted

omar-kabbani commented 6 months ago

Check out PR https://github.com/spur-sim/spur/pull/71 !

wklumpen commented 5 months ago

So - I think there's a few things that make GTFS especially tricky. One of them is bundling into tours, as tours require that the last component of tour n is the first component of tour n+1. To do this properly you'd have to make some assumptions about layovers, etc.

First, I think @peterlai1 is right about the block IDs being useful for tours, but they are not required for basic GTFS. So with that in mind I'm going to suggest a very simple approach that makes one big simplification: Each trip is run by a distinct vehicle. That is, each trip on a route is its own tour.

My thinking of the logic is this: Build the graph first, using route.txt to identify rail routes (allow folks to filter as needed), then identify all trips.txt using that route, and then use stop_times.txt to build the components network assuming a bi-drecitonal graph. If a stop_id_1 to stop_id_2 link exists already it won't get duplicated. Once the graph is built we can dump the components back into JSON just by iterating through the edges.

Then, we simply create a tour for each trip in trips.txt, create a vehicle in trains.json that is assigned to it, and off we go.

Then, if we want to do something more compact and realistic with block IDs we can do that to as a second feature.

I'm going to try and move the logic @omar-kabbani has already wonderfully put together over into this method, and I will probably just incorporate it into a function. I'm thinking it's time for a spur.data module to sit alongside the spur.core module.

Thanks again for everyone's help! I think once we have a basic GTFS converter we'll be able to very rapidly prototype a lot of things, even if there are some simplifications.