open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
5 stars 23 forks source link

Allow dimension query in UN API input #207

Closed brockfanning closed 3 years ago

brockfanning commented 3 years ago

This allows a dimension query to be passed in to the UN API input, for more fine-grained control over the query. To show an example, the following are identical:

The original "reference_area" approach:

inputs:
  - class: InputSdmxMl_UnitedNationsApi
    reference_area: 524

And here's the new approach which would be identical:

inputs:
  - class: InputSdmxMl_UnitedNationsApi
    dimension_query:
      REF_AREA: 524

Other examples, here's how to limit to national reporting:

inputs:
  - class: InputSdmxMl_UnitedNationsApi
    dimension_query:
      REF_AREA: 524
      REPORTING_TYPE: N

Or to pull multiple reference areas (eg, Nepal and India) using the "+" syntax:

inputs:
  - class: InputSdmxMl_UnitedNationsApi
    dimension_query:
      REF_AREA: 524+356

More details on the query construction is here: https://unstats.un.org/sdgs/files/SDMX_SDG_API_MANUAL.pdf

brockfanning commented 3 years ago

@LucyGwilliamAdmin One note on this - I realized that the country data is not available in the UN API. So setting REPORTING_TYPE to "N" will actually yield zero results (and I think will also cause an error).

So although I think this is still useful, it can't be used to get the country data. I will check with UNSD about whether it's possible to get the country data.

LucyGwilliamAdmin commented 3 years ago

@brockfanning could you pull latest from 1.3.0-dev if possible please? I'm getting logging error when trying to test

brockfanning commented 3 years ago

Sure thing, I just merge the latest into this branch.

LucyGwilliamAdmin commented 3 years ago

@brockfanning I've put this into practice here: http://sdgdev-813006012.eu-west-1.elb.amazonaws.com/un-api-extra-parameters2/16-1-1/

Pulling data from 4 diff reference area: UK, England and Wales, Scotland and Northern Ireland.

You said it's not possible to get data with REPORTING_TYPE: N but is there any other queries that should be tested. Is it pretty much anything? e.g. I could do SEX: F

brockfanning commented 3 years ago

@LucyGwilliamAdmin Yep it should work with any dimension. I believe I got an error when I tried REPORTING_TYPE of N and my guess is that it was because it can't handle zero results. If that's the case we should address that as well, maybe in a separate PR or here in this one.

LucyGwilliamAdmin commented 3 years ago

@brockfanning what are your thoughts for handling zero results? Would checks pass or fail?

LucyGwilliamAdmin commented 3 years ago

4 diff ref areas: image

4 diff ref areas and sex=female: image

brockfanning commented 3 years ago

Just confirming, when I try with this...

inputs:
  - class: InputSdmxMl_UnitedNationsApi
    dimension_query:
      REF_AREA: 524
      REPORTING_TYPE: N

...I get this error: urllib.error.HTTPError: HTTP Error 404: Not Found

I think when the API has not results it might return a 404? That seems odd, but we should be able to work around it. I'll push a change. In my opinion zero results should simply import 0 indicators, but should not cause an error.

LucyGwilliamAdmin commented 3 years ago

@brockfanning I checked and also get that error when using REPORTING_TYPE: N

brockfanning commented 3 years ago

@LucyGwilliamAdmin Actually I reversed my opinion - if the fetching of the SDMX is failing, for whatever reason, it is probably something that should abort the build. Also, I noticed that having zero data causes other exceptions further down the line in other code. What do you think?

Meanwhile I've also added a more descriptive exception message in my latest commit.

LucyGwilliamAdmin commented 3 years ago

I think aborting the build might be best - if people aren't familiar with the API, they may not know that what they're trying to fetch isn't available

jwestw commented 3 years ago

I noticed this PR and have an opinion. Am I correct in thinking a 404 might come about because:

In the case of the first one, we would want the build to fail. But in the case of the other two, it would be really frustrating if the site build failed due to missing data on the API side, so maybe we want the build to continue. Just an idea.

LucyGwilliamAdmin commented 3 years ago

My thoughts are if someone isn't that familiar with APIs and 2 or 3 occurs then they could be left wondering why no data is showing if the build doesn't abort

brockfanning commented 3 years ago

@LucyGwilliamAdmin @jwestw Based on the discussion above I think this one is all set. Were there any more concerns?