This repository contains code for the data integration of the BELTRANS project which studies Intra-Belgian translation flows between French and Dutch in the period 1970-2020. Data from different heterogenous data sources is integrated to create a FAIR corpus, this includes XML files from different sources but also existing large RDF dumps.
Preprocessing scripts for the different data sources are stored in the respective data-source folder. This mainly includes Python scripts but also RML mapping documents. The data integration is currently controlled by a bash script in the data-integration folder.