open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
6 stars 22 forks source link

SDMX global output #268

Closed brockfanning closed 3 years ago

brockfanning commented 3 years ago

This adds functionality to make it easier to produce globally-compatible SDMX output. Here are the features in this PR:

  1. The SDMX output has a new parameter called output_subfolder which can be used to change the default sdmx subfolder. This allows you to have multiple SDMX outputs (such as one national and one global).
  2. The SDMX output has a new parameter called global_content_constraints. If set to true, this will drop all rows of data that don't meet the global content constraints per series. (For example, this drops data for series SI_COV_MATNL if SEX is not set to F.) If the logging parameter includes 'warn', then details about the errors will be logged during the build.
  3. The YAML config file for Open SDG has a new option: sdmx_output_global. This works exactly the same as the sdmx_output option, except that the follow parameters are automatically set:

    dsd: (the global DSD)
    msd: (the global MSD)
    structure_specific: true
    constrain_data: true
    constrain_meta: true
    global_content_constraints: true
    output_subfolder: sdmx-global

    If no other customizations are needed beyond these, then the sdmx_output_global can simply be set to true. Example:

    sdmx_output_global: true

    Otherwise it can have all the same parameters as the existing sdmx_output option. Example:

    sdmx_output_global:
        meta_reporting_type: N
        meta_ref_area: KG
        etc...

Both the existing sdmx_output and the new sdmx_output_global can be used at the same time.

brockfanning commented 3 years ago

Here is an example of this in action: http://brock.tips/fcdo-kg-data/

Notice there are two SDMX outputs - the second one is the global.

LucyGwilliamAdmin commented 3 years ago

@brockfanning looks good

Should the global series be both in the global output and national output?

I've just been having quick look at files to check if it's constraining properly but I'm not sure about one of the series:

  1. Go to http://brock.tips/fcdo-kg-data/sdmx/all.xml
  2. Search for "SI_COV_BENFTS"
  3. See first series i.e. where REF_AREA is "KG"

It's in both outputs i.e. global and national

EDIT: ignore my question, I'm pretty sure that's right actually!

brockfanning commented 3 years ago

Yep, it's OK for "KG" to be in the "global" output, since that code is in the global DSD. The ones that are dropped from the global output the subnational codes, like "KG01", "KG02", etc.