open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
5 stars 23 forks source link

Usage of the SDMX input classes #41

Closed brockfanning closed 5 years ago

brockfanning commented 5 years ago

Starting this issue to continue the conversation from #37

brockfanning commented 5 years ago

@LucyGwilliamAdmin You asked: what about indicators that don't have data? Similarly to having CSV as the input, would we need to SDMX-ML placeholders?

That's a good question and I'm not sure what the best way is. Placeholders may be a good place to start. I guess these would be metadata-only indicators? I'm actually not clear on how metadata should be imported with SDMX yet. Some things I'm wondering:

  1. Is there going to be a totally separate SDMX file, perhaps using a separate DSD, for metadata?
  2. If so, is there a standard for that yet, or will there be?

You can see here that the open-sdg output currently only provides minimum required metadata. But I imagine that if the goal is to manage everything in SDMX, we will need to figure out how to import metadata from SDMX too.

LucyGwilliamAdmin commented 5 years ago

@brockfanning

  1. Metadata will be provided using SDMX eventually, I think this will be in separate files (but not sure). The metadata will use a separate DSD - an MSD (metadata structure definition).
  2. There isn't a standard for that yet but it is being developed.

So at the moment, there is no metadata in the Metadata tab panel? If so, is there any way we could have data coming from SDMX and metadata coming from .md?

brockfanning commented 5 years ago

Yes, without any non-SDMX sources of metadata, the metadata tabs are pretty bare right now - just the indicator id and the target id. We may be able to pull more metadata from the existing SDMX - that could be something to look into.

And yes, in theory it should be possible to combine SDMX data with YAML metadata. This object-oriented approach can send any number of "inputs" to a single "output". So there could be any number of SDMX inputs (like in this example) as well as any number of the YAML inputs (like in this example). All the inputs should be in a single list (like here), and then passed to the output (like here).

LucyGwilliamAdmin commented 5 years ago

I'm trying to add metadata from md files in the meta folder so I have made changes to this file (line 64-69) but the metadata doesn't seem to be coming through to the feature branch (travis checks passed though)

LucyGwilliamAdmin commented 5 years ago

Never mind, a couple of things were missing - metadata is now showing on test branch but I am getting an error in the console: image image Something to do with data not being in CSV, maybe?

LucyGwilliamAdmin commented 5 years ago

Guessing it's something to do with metadata and data reacting together, as data isn't showing now that metadata is showing?

LucyGwilliamAdmin commented 5 years ago

@brockfanning any idea?

LucyGwilliamAdmin commented 5 years ago

I now have SDMX-ML files for all indicators (reported and placeholders) as well as a metadata file for each indicator on this branch but now data or metadata isn't showing on feature branch. Still getting error mentioned above. Also getting lots of validation errors in the travis build.

LucyGwilliamAdmin commented 5 years ago

Actually metadata is showing

brockfanning commented 5 years ago

Let me give this a try locally and see if anything jumps out at me. More soon.

brockfanning commented 5 years ago

@LucyGwilliamAdmin Locally I also got some metadata validation errors. Before diving into the SDMX stuff, let's resolve those errors.

First I saw a whole bunch of these:

Validation errors for indicator [some indicator id]
None is not of type 'string', 'integer'

It would be great if the errors displayed the field name (possible future improvement?) but I figured out that these are in reference to the data_keywords key. Many indicators have nothing there, and instead need to have at least empty quotes.

Next I saw this:

Validation errors for indicator 16-1-1
True is not of type 'string', 'integer'
datetime.datetime(2019, 3, 15, 0, 0) is not of type 'string', 'integer'

This pointed out some problems in 16-1-1's indicator_name and graph_title fields.

Last I saw this:

Validation errors for indicator 9-c-1
'Y' is not of type 'boolean'

This pointed out some problems in 9-c-1's data_show_map field.

After fixing these issues, validation passes again. I didn't go further than that though, so I'm not sure if that all helps with the SDMX problem.

Note about validation: The new object-oriented approach to validation (what you're using here) is totally different from the old sdg-build validation. The new approach uses "JSONSchema" validation. So it's not surprising that the UK metadata is suddenly not passing validation: it has never been run through this JSONSchema validation before now.

brockfanning commented 5 years ago

Forgot to mention, I put up a PR with my fixes to those metadata issues here.

LucyGwilliamAdmin commented 5 years ago

Ok great thanks, I've merged that PR and I'm no longer getting the validation errors. Still not sure what's causing this error though: image

LucyGwilliamAdmin commented 5 years ago

What does this function do? image

brockfanning commented 5 years ago

That open-sdg code needs some commenting for sure - but my guess is that it converts the JSON produced by sdg-build into a format more directly usable by open-sdg.

LucyGwilliamAdmin commented 5 years ago

Should it work in exactly the same way for CSV and SDMX files?

brockfanning commented 5 years ago

To be clear, that code is from open-sdg, and open-sdg is only seeing the output of sdg-build. The output of sdg-build is always the same, regardless of whether the input with CSV or SDMX. So yes, as long as sdg-build is doing its job, that open-sdg code should not notice any difference between CSV or SDMX data sources.

brockfanning commented 5 years ago

I think the next thing to look at is why sdg-build is not generating data: https://sdg.mango-solutions.com/data/sdmx/comb/1-1-1.json

This may be where the actual bug lies. I'll try to look at it when able, but I think that's the next hurdle.

LucyGwilliamAdmin commented 5 years ago

Would it be something to do with the metadata file as until I added the metadata files to the input, the data was showing?

LucyGwilliamAdmin commented 5 years ago

@brockfanning are there any updates on this?

brockfanning commented 5 years ago

@LucyGwilliamAdmin Actually I did fix a bug about a week ago. Can you give it a try with version 0.4.1 of sdg-build?

LucyGwilliamAdmin commented 5 years ago

Yeah, will do that now

LucyGwilliamAdmin commented 5 years ago

@brockfanning great, this worked!

LucyGwilliamAdmin commented 5 years ago

@brockfanning do you think there is a better format to have this, which will make it easier for countries to maintain?

brockfanning commented 5 years ago

@LucyGwilliamAdmin It would depend on the team involved. I've found that probably the simplest format would be CSV, since it can be edited with Excel. JSON can be confusing because of the extra syntax (braces, quotes, etc.). YAML might be better than JSON, but would need to be edited in a text editor (ie, could not be edited in Excel).

LucyGwilliamAdmin commented 5 years ago

Yes, I was thinking CSV - originally the mapping was in CSV format but when I tried to convert this to JSON in the script there was some issues with commas (see here)

brockfanning commented 5 years ago

@LucyGwilliamAdmin I'll go ahead close this, but please re-open or start a new issue if there was anything left to cover.