Integrate ‘ingestion’ scripts from CKAN set-up contractors

As part of the contract to develop transport-data/tdc-data-portal, the contractor wrote some “data ingestion scripts”. These are two directories in that repo:

Unfortunately:

The scripts for the JRC IDEES source and Eurostat provider duplicate the contents of transport_data.jrc and transport_data.estat; the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.
The code seems extremely verbose (JRC file is 4400 lines without formatting; data-integration/process_tdc.py is 20000 lines), and involves a lot of duplication/copy-and-paste.
SDMX metadata are not generated; metadata are fed directly into CKAN via API calls.

More info:

The scripts do serve as a complete/working example of how to interact with CKAN through its APIs—though directly using requests, and not through a CKAN API client (#3).
According to the contractor, the scripts either create records or skip those that exist; they do not update metadata on existing records if it has changed.

To resolve, likely in multiple issues/PRs:

Integrate the functions of the scripts by into existing modules in the current package.
Replace the workflow that calls the scripts with a workflow calling, e.g. tdc jrc refresh-ckan
Add functionality to identify existing records and update as needed.

transport-data / tools

Integrate ‘ingestion’ scripts from CKAN set-up contractors #34