hochschule-darmstadt-UAS / ddk-artbrowser

Exploring the world of arts using open data
http://openartbrowser.org/
MIT License
1 stars 1 forks source link

Create ETL Wrapper Script #65

Closed mauamy closed 3 years ago

mauamy commented 3 years ago

Reason (Why?) Right now the ETL consists of the xml-import and the elasticsearch-upload steps. To have an automated ETL these steps needs to be bundled into one process.

Solution (What?) First clean up the etl directory tree. There are many not needed files from the openartbrowser project which can be removed. Afterwards we can flatten the etl directory tree.

Create and etl-setup.shscript that sets up and installs the required python environment:

  1. Create virtual python environment: virtualenv venv
  2. Install the python packages from requirements.txt into the virtual environment

Create an etl wrapper script (e.g. like etl.sh in the openartbrowser project) that takes care of the following steps:

  1. Use the virtual python environment: source venv/bin/activate
  2. Handle paths of input xml-files and output json files
  3. Run the xml-importer with theses input xml-files and store the json files
  4. Run the elasticsearch_uploader.py script with the generated json files.

Relation to other Issues This issue is part of #3

Acceptance criteria The etl.sh wrapper script can be executed on the staging (and production) server and creates a new elasticsearch index from the newly parsed xml files.

mauamy commented 3 years ago

Done in #70