Create ETL Wrapper Script

Reason (Why?) Right now the ETL consists of the xml-import and the elasticsearch-upload steps. To have an automated ETL these steps needs to be bundled into one process.

Solution (What?) First clean up the etl directory tree. There are many not needed files from the openartbrowser project which can be removed. Afterwards we can flatten the etl directory tree.

Create and etl-setup.shscript that sets up and installs the required python environment:

Create virtual python environment: virtualenv venv
Install the python packages from requirements.txt into the virtual environment

Create an etl wrapper script (e.g. like etl.sh in the openartbrowser project) that takes care of the following steps:

Use the virtual python environment: source venv/bin/activate
Handle paths of input xml-files and output json files
Run the xml-importer with theses input xml-files and store the json files
Run the elasticsearch_uploader.py script with the generated json files.

Relation to other Issues This issue is part of #3

Acceptance criteria The etl.sh wrapper script can be executed on the staging (and production) server and creates a new elasticsearch index from the newly parsed xml files.

hochschule-darmstadt-UAS / ddk-artbrowser

Create ETL Wrapper Script #65