hasadna / open-bus

:bus: Analysing Israel's public transport data
93 stars 27 forks source link

Stride - GTFS ETL #339

Closed OriHoch closed 3 years ago

OriHoch commented 3 years ago

We should have all the GTFS data available in the Stride DB, so we can join it with the SIRI data

See the detailed spec for more details: https://docs.google.com/document/d/1LcGlK0BfJ2C2jE0O0oDBjeidG8KfdNMpGH_bMskdrOc/edit?usp=sharing

The ETL should load the GTFS data + update relevant Stride tables with the data according to the spec

Implementation notes

part of epic: https://github.com/hasadna/open-bus/issues/335

AvivSela commented 3 years ago

Documentation could be found here: https://github.com/hasadna/open-bus-gtfs-etl/wiki

As part of this task i created a simplified version of gtfs-stat script that just get a date and GTFS files and return trips and routes stat for the given date.

Here is the signature of the main function: https://github.com/hasadna/open-bus-gtfs-etl/blob/main/open_bus_gtfs_etl/gtfs_stat/gtfs_stats.py

def analyze_gtfs_date(date_to_analyze: date, gtfs_file_path: Path, tariff_file_path: Path,
                      cluster_to_line_file_path: Path, trip_id_to_date_file_path: Path) -> Tuple[DataFrame, DataFrame]:
    """
    Aggregate GTFS data of single date into trip-stat and route stat DataFrames
    """