open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
5 stars 23 forks source link

SDMX-ML: headline data isn't being recognized as headline data #53

Closed LucyGwilliamAdmin closed 5 years ago

LucyGwilliamAdmin commented 5 years ago

For example, here Kyrgyzstan should not be in Reference Area breakdown but should just show as headline data.

Is there something we can do about this?

LucyGwilliamAdmin commented 5 years ago

@brockfanning: Cambodia POC site also has same issue. We can use add_data_alteration function to replace "Kyrgyzstan" with "".

brockfanning commented 5 years ago

@LucyGwilliamAdmin I just tried it locally with Cambodia and unfortunately the graph doesn't display now. I'll dig into it a bit more.

brockfanning commented 5 years ago

@LucyGwilliamAdmin A partial success with Cambodia here.

To see the code that I changed, see here. Some notes:

  1. The "fix_meta" part is not actually working yet. I created this issue about it.
  2. There appears to be a problem with the edges being generated. From looking at the [CSV](), it should be possible to select "Degree of urbanisation" without selecting "Reference area". So I think there may be a possible bug in the edges code (though that code has been working fine as-is for quite a while now). I created this issue about it.
  3. General note - getting a working "headline" may be a matter of tweaking things. For example, for Cambodia I had to drop the "Source details" column, in addition to changing "Cambodia" to "" in the "Reference area" column.
LucyGwilliamAdmin commented 5 years ago

@brockfanning So I've just tried this here but I'm getting this error: https://travis-ci.org/ONSdigital/sdg-data/builds/597680086

LucyGwilliamAdmin commented 5 years ago

I have just changed git+git://github.com/open-sdg/sdg-build@0.4.1 in requirements.txt to git+git://github.com/open-sdg/sdg-build and it has built now.

LucyGwilliamAdmin commented 5 years ago

Ok great, this worked. One thing with the fix_data function I used df['Reference area'] = df['Reference area'].replace('Kyrgyzstan', np.nan) instead of df['Reference area'] = df['Reference area'].replace('Kyrgyzstan', '') as for some reason, the replacing with an empty string wouldn't allow me to select an option from the child disaggregation so just something to be aware of.

Thanks so much for your help.

brockfanning commented 5 years ago

@LucyGwilliamAdmin Yep, we were figuring out the same thing simultaneously. I just made a commit that should avoid that problem in the future, if someone chooses to use ''. So that resolves the "edges" issue I mentioned above. I'll go ahead and close this.