Open semio opened 7 years ago
current design:
- procedure: merge_entity
ingredients:
- input_ingredient
options:
dictionary: merge.json
merged: keep # what to do with the entities to be merged
target_column: entity_name
result: output_ingredient
in merge.json:
{
"new_entity_1": ["old_entity_1", "old_entity_2"],
"new_entity_2": ["old_entity_3", "old_entity_4"]
}
- procedure: split_entity
ingredients:
- input_ingredient
options:
dictionary: split.json
splitted: keep # what to do with the entities to be splitted
target_column: entity_name
result: input_ingredient
in split.json:
{
"entity_to_split_1": ["sub_entity_1", "sub_entity_2"],
"entity_to_split_2": ["sub_entity_3", "sub_entity_4"]
}
This assumes sub_entity_1
to sub_entity_4
exists in the dataset. The split ratio will be calculated with first valid values form the sub entities
Problem:
see https://github.com/open-numbers/ddf--gapminder--co2_emission/issues/1