Repository has the following structure:
Directory | Description |
---|---|
(root) | Batch scripts to execute the pipeline. |
data | Source data sets used in the project. |
doc | Resources related to this technical report. |
processed-data | Working directory for the data pipeline to which processed data is stored |
resources | Resources used by the data pipeline: Stopword definitions, Skill vocabulary definitions |
src | Python source code for the data pipeline. |
visualisation | Working directory for the produced visualisations. |
The data transformation pipeline can be executed using the run-data-transformation-jobs script in the root directory.