Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
Currently, the CLI is implemented as a local Python script. Dockerizing the CLI reduces the local setup time and lowers the chances of errors.
Previously, we have tried to run the CLI as a Docker and created a "Docker in Docker" scenario which didn't work because of faulty volume mounting (see #99).
Alternatively, we could try to use Airflow for orchestrating the pipeline runs.
This is the user story to provide some background for the purpose of the CLI:
User Story
As a user, I want a CLI to select all the pipelines I'd like to run. I can choose the geographical region for which I want to run the pipelines. In the case of the population data, I also want to select specific demographic groups I am interested in.
Once I have made all my selections, the CLI will run all the pipelines correctly (e.g., the google-poi pipeline depends on the osm-poi pipeline). Once the data pipelines ran successfully, all the data should be imported into the Postgres database.
When all the data has been imported, the Jupyter environment should be launched so I can start working with the data conveniently.
Next to running the individual data pipelines, I want to be able to download the demo data through the CLI. Once the demo data is downloaded, the database and Jupyter notebook with the popularity correlation should be launched.
Currently, the CLI is implemented as a local Python script. Dockerizing the CLI reduces the local setup time and lowers the chances of errors.
Previously, we have tried to run the CLI as a Docker and created a "Docker in Docker" scenario which didn't work because of faulty volume mounting (see #99).
Alternatively, we could try to use Airflow for orchestrating the pipeline runs.
This is the user story to provide some background for the purpose of the CLI:
User Story
As a user, I want a CLI to select all the pipelines I'd like to run. I can choose the geographical region for which I want to run the pipelines. In the case of the population data, I also want to select specific demographic groups I am interested in. Once I have made all my selections, the CLI will run all the pipelines correctly (e.g., the
google-poi
pipeline depends on theosm-poi
pipeline). Once the data pipelines ran successfully, all the data should be imported into the Postgres database. When all the data has been imported, the Jupyter environment should be launched so I can start working with the data conveniently. Next to running the individual data pipelines, I want to be able to download the demo data through the CLI. Once the demo data is downloaded, the database and Jupyter notebook with thepopularity correlation
should be launched.