Closed OriHoch closed 3 years ago
Documentation could be found here: https://github.com/hasadna/open-bus-gtfs-etl/wiki
As part of this task i created a simplified version of gtfs-stat script that just get a date and GTFS files and return trips and routes stat for the given date.
Here is the signature of the main function: https://github.com/hasadna/open-bus-gtfs-etl/blob/main/open_bus_gtfs_etl/gtfs_stat/gtfs_stats.py
def analyze_gtfs_date(date_to_analyze: date, gtfs_file_path: Path, tariff_file_path: Path,
cluster_to_line_file_path: Path, trip_id_to_date_file_path: Path) -> Tuple[DataFrame, DataFrame]:
"""
Aggregate GTFS data of single date into trip-stat and route stat DataFrames
"""
We should have all the GTFS data available in the Stride DB, so we can join it with the SIRI data
See the detailed spec for more details: https://docs.google.com/document/d/1LcGlK0BfJ2C2jE0O0oDBjeidG8KfdNMpGH_bMskdrOc/edit?usp=sharing
The ETL should load the GTFS data + update relevant Stride tables with the data according to the spec
Implementation notes
open-bus-gtfs-etl
)part of epic: https://github.com/hasadna/open-bus/issues/335