Up until now the dataset amalgamator (or 'dataset-combiner') was a collection of Python scripts living in a separate branch of the etlocal repository. This posed several challenges in the form of maintainability and interoperability with the rest of the application. A rewrite of these scripts into Ruby addresses these challenges.
What?
This PR rewrites the dataset-combiner from Python to Ruby and thereby integrates it into the existing etlocal Ruby/Rails codebase. Furthermore:
When datasets get combined the new code requires a source_data_year argument which is used to assure that all source datasets have at least the given analysis-year,
If a weighted average is given for a certain (set of) keys and combined value for those keys is 0.0 the combination_method will revert to 'average' instead of 'weighted average'. This fixes #464,
Datachecks were added to see if the values in the given source-datasets are valid
If something goes wrong the script attempts to return a human-readable warning or error message, making debugging of interface element YAML files or datasets much easier
How?
This PR introduces 3 new Ruby classes:
DatasetCombiner: validates input and delegates the operations to two subclasses that do the actual heavy lifting:
ValueProcessor: Combines values of items found in all source datasets into an new target dataset
DataExporter: Exports the resulting dataset into a datafile (data.csv) and the description of which datasets were combined into a commit file (commits.yml), and generates an accompanying data migration.
Usage
The script can by used through the command-line. Here's an example:
To see the full list of arguments (required and optional) for this command, please run:
rails dataset:combine --help
Note
This PR includes the updated interface element yaml files from the dataset-amalgamator branch. It does not include the data migrations from that branch.
Why?
Up until now the dataset amalgamator (or 'dataset-combiner') was a collection of Python scripts living in a separate branch of the
etlocal
repository. This posed several challenges in the form of maintainability and interoperability with the rest of the application. A rewrite of these scripts into Ruby addresses these challenges.What?
This PR rewrites the dataset-combiner from Python to Ruby and thereby integrates it into the existing
etlocal
Ruby/Rails codebase. Furthermore:source_data_year
argument which is used to assure that all source datasets have at least the given analysis-year,0.0
thecombination_method
will revert to 'average' instead of 'weighted average'. This fixes #464,How?
This PR introduces 3 new Ruby classes:
DatasetCombiner
: validates input and delegates the operations to two subclasses that do the actual heavy lifting:ValueProcessor
: Combines values of items found in all source datasets into an new target datasetDataExporter
: Exports the resulting dataset into a datafile (data.csv
) and the description of which datasets were combined into a commit file (commits.yml
), and generates an accompanying data migration.Usage
The script can by used through the command-line. Here's an example:
To see the full list of arguments (required and optional) for this command, please run:
Note
This PR includes the updated interface element yaml files from the
dataset-amalgamator
branch. It does not include the data migrations from that branch.Closes #464 Closes #477