open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
6 stars 22 forks source link

Allow suffix on imported metadata keys #253

Closed brockfanning closed 3 years ago

brockfanning commented 3 years ago

This allows for the exact same schema to be used for both "global" and "national" for example, but one of these has a suffix added so that their keys are distinct. For example, adding "__GLOBAL" to the end of each of the keys in the global metadata.

A common use-case I expect will be importing global metadata from the UNSD database and importing national metadata from Word files. Without this PR, these would "collide" because they use the same set of metadata keys. But this PR allows one of them to have a suffix, so that they can both coexist in the same metadata set.

brockfanning commented 3 years ago

@LucyGwilliamAdmin Do you have any immediate objections or questions on this one? I was wondering if we might go ahead and merge it, as I think it would be a useful addition to 1.4.0. Happy to wait though if you would like to test.

LucyGwilliamAdmin commented 3 years ago

@brockfanning What is the testing needed for this one? Uploading a national word metadata file and a global word metadata file and ensuring the fields display in correct place??

brockfanning commented 3 years ago

@LucyGwilliamAdmin Since https://github.com/open-sdg/open-sdg-data-starter/pull/46 has been merged, the metadata_schema.yml on the data starter has two distinct sets of metadata fields: the national scope conforms exactly to the SDMX concepts, and the global scope is the same except they have the suffix "GLOBAL" on each one. For example, "SDG_INDICATOR_INFOGLOBAL".

Up to now, Open SDG implementations have needed to maintain global metadata in the data repository, but it would be much better if global metadata could simply be imported from the UN. In other words, it's not great that each country has to maintain global metadata separately.

The idea with this PR is that you would have a metadata input pulling global metadata from the UN directly, but you would add a suffix to it so that it gets "routed" to the global scope (ie, displayed in the "Global metadata" tab).

This could be done with this in "inputs":

inputs:
  # CSV data as normal.
  - class: InputCsvData
    path_pattern: data/*.csv
  # Yaml config as normal.
  - class: InputYamlMeta
    path_pattern: indicator-config/*.yml
    git: false
  # Yaml national metadata as normal.
  - class: InputYamlMeta
    path_pattern: meta/*.yml
    git: true
    git_data_dir: data
  # Global metadata directly from the UN
  - class: InputSdmxMeta
    source: https://unstats.un.org/SDGMetadataAPI/api/Metadata/SDMXReport/G.ALL.1
    meta_suffix: __GLOBAL

The important bit is that last one, the InputSdmxMeta. That should pull global metadata directly from the UN, and also add "__GLOBAL" to each of the fields, so that they end up showing in the "Global metadata" tab, by virtue of matching the "global" scope in the metadata_schema.yml file.

That UN API does not yet have a full catalog of all the indicators metadata, but I thought it would be nice to get this into 1.4.0 if possible, so that it will be ready to use once the UN API is fully populated.

LucyGwilliamAdmin commented 3 years ago

@brockfanning I had a look at this but seems that the build was going on indefinitely until it eventually timed out: https://github.com/LucyGwilliamAdmin/nepal-data/runs/2822312949?check_suite_focus=true

brockfanning commented 3 years ago

@LucyGwilliamAdmin It looks like the UN Metadata API is having issues. I'll check with them and see if they can resolve it.

brockfanning commented 3 years ago

@LucyGwilliamAdmin The API has been improved, I think it's OK to try this again.