Aviation provides the only rapid worldwide transportation network, which makes it essential for global business. It generates economic growth, creates jobs, and facilitates international trade and tourism. The air transport industry supports a total of 65.5 million jobs globally. It provides 10.2 million direct jobs.
This project aims to analyse the impact of Covid-19 on the aviation industry. It also provided a great opportunity to develop skills and experience in a range of tools such as Apache Airflow, Apache Spark, Tableau and some of the AWS cloud services.
Airflow orchestrates the following tasks:
The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019.
In order to avoid the out-of-memory issue, data is incrementally loaded into the s3 bucket.
Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders
"Crowdsourced air traffic data from the OpenSky Network 2019–2020"
Earth System Science Data 13(2), 2021
This dataset includes time-series data tracking the number of people affected by COVID-19 worldwide
The data is in CSV and contains the list of all airport codes.
The dataset contains country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements. [ISO 3166-1]
The dataset was used to map countries with continents.
Contains ISO-3 codes and names of Indian States.
Find the entire analysis here
Clone and cd into the project directory.
git clone <https://github.com/siddharth271101/Covid-19-and-Aviation-Industry.git>
cd beginner_de_project
Note: Replace {your-bucket-name} in setup.sh
, covid_flights_etl.py
and covid_flights_dag.py
before proceeding with the steps mentioned below.
Once the virtual environment is activated, run the following command
$ pip install -r requirements.txt
setup.sh
Download the data and create an s3 bucket by running setup.sh
as shown below
sh setup.h
setup.sh
also starts to incrementally load the opensky data to the S3 bucket.
After setup.sh
runs successfully, start the docker container using the following command
docker compose -f docker-compose-LocalExecutor.yml up -d
We use the following docker containers -
Open the Airflow UI by hitting http://localhost:8080 in browser, start the covid_flights_dag DAG.
Once the dag-run is successful, check the output folder of the S3 bucket.
This blog explains the steps in detail to build a Tableau dashboard using Athena as a data source.