hasadna / open-bus

:bus: Analysing Israel's public transport data
93 stars 27 forks source link

gtfs-etl: add documentation to the code #346

Closed OriHoch closed 3 years ago

OriHoch commented 3 years ago

Please have all documentation as part of the code, as they are related and updated together

Curretnly, I think not much more detail is availble in docs, so nothing needs to be changed, maybe just update the CLI commands help messages

I updated the README to conform to the other stride repos to include all relevant details for using with Docker compose or for local development

OriHoch commented 3 years ago

Installation

  1. Clone both repositories: open-bus-gtfs-etl and open-bus-stride-db:

    • git clone https://github.com/hasadna/open-bus-gtfs-etl.git

    • git clone https://github.com/hasadna/open-bus-stride-db.git

  2. Create virtual environs and activate it:

    • python3.8 -m venv open-bus-gtfs-etl/venv

    • . open-bus-gtfs-etl/venv/bin/activate

  3. Install requirement packages and install open-bus-stride-db package

    • python -m pip install -r open-bus-stride-db/requirements.txt

    • python -m pip install -r open-bus-gtfs-etl/requirements.txt

    • python -m pip install -e open-bus-stride-db

  4. Create .env file with database connection string:

    • echo "export SQLALCHEMY_URL=postgresql://postgres:123456@localhost" > open-bus-gtfs-etl/.env

Prepare environment

before executing the app, you need to activate the virtual environment and the .env file:

. venv/bin/activate

. .env

Executing the app using CLI

python -m open_bus_gtfs_etl.cli --help

Usage: python -m open_bus_gtfs_etl.cli [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

Commands:

  `analyze-gtfs-stat`       Analyze GTFS files into stat route and trips

  `create-gtfs-metadata`   Create metadata file of existed GTFS files

  `download-gtfs`           Download GTFS file from MOT FTP server

  `upload-gtfs-stat-to-db`  Main endpoint for GTFS ETL.

Commands & Phases

This ETL could be summarized into 3 main steps:

  1. Downloading GTFS files from ftp server of MOT (Ministry of Transport and Road Safety)
  2. Create aggregate the raw GTFS data so it will describe routes
  3. Upload the data into PostgreSQL database

As you can see, each command described above in the CLI documentation, represent one step:

  1. download-gtfs - will download the GTFS files from FTP server and will create JSON metadata file next to them.
  2. analyze-gtfs-stat - will get the path for the GTFS metadata file and perform some agregation that will create 2 files: route-stat and trip-stat
  3. upload-gtfs-stat-to-db - if called without any parameter will download GTFS files, perform aggregations and upload the data into DB. in case you already downloaded the GTFS files or created the aggregation reports, you can skip those stages.
OriHoch commented 3 years ago

I disabled the wiki feature, copied the page here for reference ^

AvivSela commented 3 years ago

i added new section about the 3 operations the app should support and about the env. variables that the app using. https://github.com/hasadna/open-bus-gtfs-etl#supported-operations-and-configurations