Closed brockfanning closed 2 years ago
@brockfanning I'm trying to alter metadata based on data but I'm actually not too sure how this new functionality works?
@LucyGwilliamAdmin I added some examples in the README: https://github.com/open-sdg/sdg-build/blob/33c54439e85bbfe0ed26e7d7725d1bb1b57d0b60/README.md#alterations-of-data-metadata-indicator-id-and-indicator-name
@brockfanning thanks - does that code go in the check and build data python files?
Does this sound about right for setting reporting status to complete if there's data?
def alter_meta(meta, context):
if context[data].empty == False:
meta['reporting_status']='complete'
return meta
my_input.add_meta_alteration(alter_meta)
Also what is my_input
?
Ah sorry, my_input
is in reference to using the inputs directly, rather than through the data_config.yml approach. You can ignore that. All that is changing is that you can now use a second parameter in your existing alter functions: context
.
Your snippet looks pretty good, except there should be quotes around data, like: context['data']
@brockfanning thank you - what is data if it is empty? I'm getting this error when I try to check if it's an empty df: https://github.com/open-sdg-nepal/data/runs/2555712994?check_suite_focus=true#step:5:14
EDIT:
Think I've got it if not context['data'] is None:
Cool if that works - I've always put the "not" in a different place: if context['data'] is not None
. You can also check for both:
if context['data'] is not None and context['data'].empty == False
@brockfanning here are my changes but it doesn't seemed to have work: https://github.com/open-sdg-nepal/data/pull/33/files
@LucyGwilliamAdmin I think it is a problem with the fact that there is an indicator config input that has reporting_status set to notstarted. A quick fix might be to move the indicator config input to be first, before the SDMX input. Longer term we could think more about how these interact, because I agree that it's unexpected. (You would think that even when the indicator config is being altered, the data would still be there.)
Ah yes, moving it fixed the issue - thank you :)
Thinking a bit more on it, it will be tough/impossible to make each input have access to all the other inputs' stuff. So I think the best solution is to make clear in our documentation that the alter functions operate on each input separately.
@brockfanning is context
viewable anywhere?
@brockfanning how do I call my_indicator_id_alteration function in build_data.py?
Currently I have open_sdg_build(config='config_data.yml', alter_meta=alter_meta, add_indicator_id_alteration=my_indicator_id_alteration)
but that doesn't seem to work
@LucyGwilliamAdmin Try alter_indicator_id=my_indicator_id_alteration
. Similarly, the parameter for altering names is alter_indicator_name
.
To see what's in context you might put print(context)
in the function and then look at the output during the build.
@brockfanning I'm not too sure what's happening or what's expected to happen
I'm trying to alter indicator_id
to '1-11-1' when indicator_name
is 'Proportion of population below the international poverty line, by sex, age, employment status and geographical location (urban/rural)'
Then I'm trying to alter indicator_name
with indicator_id
'1-11-1' to 'Indicator name alteration test'
See here: https://github.com/LucyGwilliamAdmin/sdg-data-sandpit/blob/develop/scripts/build_data.py#L36
The checks seem to be passing but I'm not finding '1-11-1' or 'Indicator name alteration test' anywhere
@brockfanning actually, are context['indicator_name'] and meta['indicator_name'] not the same?
@LucyGwilliamAdmin The indicator_name
setting in the metadata/config is indeed where Open SDG gets the indicator name from. Give this a try - instead of this:
if context['indicator_name'] == 'Proportion of population below the international poverty line, by sex, age, employment status and geographical location (urban/rural)':
try this:
if context['indicator_name'] == 'global_indicators.1-1-1-title':
@brockfanning making that change has changed 1-1-1 to 1-11-1 but the name doesn't seemed to have changed in the built data:
https://lucygwilliamadmin.github.io/sdg-data-sandpit/en/meta/all.json
It looks like it's changed when I print context
though:
https://github.com/LucyGwilliamAdmin/sdg-data-sandpit/runs/2656864657?check_suite_focus=true#step:5:3724
Also should the data also have moved to 1.11.1 - I can see the data alteration happens after the id and name change but don't know how else to move it? Plus the metadata has moved
@LucyGwilliamAdmin I think this is highlighting the difficulty caused by the fact that each input is "altered" separately.
For example, your id alteration function depends on the indicator name. This works fine during the metadata input, because the indicator name is right there in the metadata. But during the data input, sdg-build has no idea what the indicator name is. All it has is a CSV file. So during the alteration of the data input, context['indicator_name']
will be None
.
Put another way - sdg-build operates by taking all of the inputs and then combining them according to their indicator id. But these alteration functions happen before the inputs have been combined. So, though you might expect that the alteration function is happening one time, on the complete indicator - it is actually happening multiple times - once per input, and each time on an incomplete indicator.
An obvious question might be: can't we change it so that the alteration only happens once, after all the inputs have been merged into complete indicators? That would definitely make it behave more intuitively. I think this might be feasible, but could be an impactful change. I'll give it a shot so you can try it out.
@LucyGwilliamAdmin Actually it would be a bit easier to try that out after #242 is merged. Otherwise we'll end up with some duplicate code, and #242 introduces that "helper" setup which is a good way to share code. Do you think we could focus on that PR first and then come back to this? I'll set this to draft.
@LucyGwilliamAdmin Since this one turned out to be more complicated and less useful than expected, I was thinking we could split it up into smaller parts. I've just opened #285 which contains a portion of this. I'll close this in the meantime, and maybe we can add other portions of this in the future if needed.
This includes the following changes:
Existing data/meta alter functions should still work. Examples and more details are in the updated README in this PR.
Testing note
In particular I would be interested in testing the indicator ID alteration, to make sure it doesn't cause any issues. The other stuff is important to test also though.