As part of the contract to develop transport-data/tdc-data-portal, the contractor wrote some “data ingestion scripts”. These are two directories in that repo:
The scripts for the JRC IDEES source and Eurostat provider duplicate the contents of transport_data.jrc and transport_data.estat; the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.
The code seems extremely verbose (JRC file is 4400 lines without formatting; data-integration/process_tdc.py is 20000 lines), and involves a lot of duplication/copy-and-paste.
SDMX metadata are not generated; metadata are fed directly into CKAN via API calls.
More info:
The scripts do serve as a complete/working example of how to interact with CKAN through its APIs—though directly using requests, and not through a CKAN API client (#3).
According to the contractor, the scripts either create records or skip those that exist; they do not update metadata on existing records if it has changed.
To resolve, likely in multiple issues/PRs:
Integrate the functions of the scripts by into existing modules in the current package.
Replace the workflow that calls the scripts with a workflow calling, e.g. tdc jrc refresh-ckan
Add functionality to identify existing records and update as needed.
As part of the contract to develop transport-data/tdc-data-portal, the contractor wrote some “data ingestion scripts”. These are two directories in that repo:
Unfortunately:
transport_data.jrc
andtransport_data.estat
; the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.More info:
requests
, and not through a CKAN API client (#3).To resolve, likely in multiple issues/PRs:
tdc jrc refresh-ckan