🐳 Dockerizing the CLI

Currently, the CLI is implemented as a local Python script. Dockerizing the CLI reduces the local setup time and lowers the chances of errors.

Previously, we have tried to run the CLI as a Docker and created a "Docker in Docker" scenario which didn't work because of faulty volume mounting (see #99).

Alternatively, we could try to use Airflow for orchestrating the pipeline runs.

This is the user story to provide some background for the purpose of the CLI:

User Story

As a user, I want a CLI to select all the pipelines I'd like to run. I can choose the geographical region for which I want to run the pipelines. In the case of the population data, I also want to select specific demographic groups I am interested in. Once I have made all my selections, the CLI will run all the pipelines correctly (e.g., the google-poi pipeline depends on the osm-poi pipeline). Once the data pipelines ran successfully, all the data should be imported into the Postgres database. When all the data has been imported, the Jupyter environment should be launched so I can start working with the data conveniently. Next to running the individual data pipelines, I want to be able to download the demo data through the CLI. Once the demo data is downloaded, the database and Jupyter notebook with the popularity correlation should be launched.

kuwala-io / kuwala

🐳 Dockerizing the CLI #101

User Story