hasadna / open-bus

:bus: Analysing Israel's public transport data
93 stars 29 forks source link

The real GTFS (based on SIRI) project #270

Open EyalBerger opened 5 years ago

EyalBerger commented 5 years ago

Quick summary of our last discussions - needs further editing, please add your insights

We want to present real and accurate buses timetable, using SIRI data.

Main pipeline includes -

  1. Defining a SiriRide (#174) -

    • as for now, we decided to use the simple index of route_id, planned_start_datetime and bus_id.
    • needs further investigation if this definition applies for all bus firms (special attention for Dan rides).
  2. Building SiriRide, AgencyDay and RouteDay Classes for storing SIRI and GTFS data in JSON format.

    • a first script is ready (Python).
    • needs to add stops_ids & stops_latlon attributes to RouteDay.
    • Classes data types settings.
    • Index; random ID (integer) or meaningful ID (String)?
  3. Developing Splunk procedure - daily process for writing Siri and route_stats (GTFS) data to the mentioned Classes.

    • how to avoid using the same rows for two consecutive days?
  4. Storing SIRI and GTFS data using google Firestore in the mentioned Classes format.

    • add the relevant RouteDay instance to SiriRide instances in a list format (for future uses. As for now the list will include only one RouteDay instance).
    • add the GTFS permanent indexes to SiriRide instances - route_mkt, route_direction, route_alternative (join based on route_id and date).
  5. Classification of Siri rides to "good" and "bad" rides ("good" is when the ride is matching the GTFS route).

    • a simple model based on specific SiriRide instance and the related RouteDay instance is ready - SiriRideAnalyzer (python).
    • a connection to Firestore needs to be developed.
    • small adjustments in SiriRideAnalyzer after SiriRide Class will be finally ready.
  6. Estimating bus stops times per SiriRide.

    • as for now, @EyalBerger will develop a very simple model, then will move forward to a more accurate one.
  7. Predicting buses timetable using historical data.

  8. Publishing the real buses timetable in a cool and friendly way.