theodi / open-data-certificate

The mark of quality and trust for open data
https://certificates.theodi.org/
MIT License
46 stars 39 forks source link

Current certification levels are disheartening, was: ONS issue #139

Closed davidread closed 8 years ago

davidread commented 11 years ago

I'm wondering if there is an issue with most existing datasets not scoring anything on the scale marked by this certificate. Is the ODI effectively branding most datasets, including lots of successful ones, as 'not even good enough for a pilot'?

For example, a cornerstone of UK's economy is the quarterly GDP/growth and inflation data from ONS. I think most people would agree that it is "key information infrastructure", is "routinely published open data that encourages reuse" and "It provides sufficient support to prompt some reuser experimentation with a data.". Yet as far as I'm aware, it trips up on ODI's certificate by not having any public statements encouraging feedback, so would not score on the pilot level for "initial forays into publishing open data". It misses by a long shot standard level e.g. because of no URIs, let alone have machine readable provenance etc. to be exemplar.

Of course ODI should make suggestions how ONS might improve their publishing, but ONS deserves a lot of credit - it collects the data with great care, defends the methods in peer-reviewed journals, is published with detailed methodology and discussion of the limitations, gets huge interest, is widely reused, is often combined with other datasets and has been timely for decades.

I love that the certificate aims high, and include lots of aspirations. But what concerns me is how the descriptions of each badge might be construed to those publishers with datasets that don't get them (i.e. pretty much all datasets right now). The certificate was envisioned as a carrot to improve when so much data publishing is rightly criticised. But for publishers that are pretty good still scoring nowhere on the certificate, could the ODI be seen as insulting, even?

I wonder if it would help by making the levels all more positive? And be more specific about some of the advantages of the linked data principles that you are promoting, rather than being abstract greater or lesser amounts of reuse. For example:

Basic - (no change)

Good - this data supports reuser experimentation with the data.

Progressive - uses modern techniques to aid reusers to extract value across multiple datasets.

Exemplar - the data is embedded in the National Data Backbone, providing full linked data services.

JeniT commented 11 years ago

One way of tackling this is to break down the levels into the different sections. For example, ONS data might be 'Exemplar' on Legal and Practical, 'Standard' on Technical and 'Basic' on Social. This would illustrate how they are really good (as you say) in some areas, but have other areas where there's room for improvement.

@jolankester @benfoxall is this something we could easily incorporate?

JeniT commented 11 years ago

I've just been through the certificate for this dataset. As you indicate, ONS scores really highly (Exemplar) on the Practical part of the certificate (I was blown away by their quality control documentation).

I had to use the data.gov.uk pages as the reference point for their data documentation. The main thing that prevents them from getting Pilot status is actually the lack of machine-readable metadata in either that page (there's no machine-readable description or release dates at http://data.gov.uk/dataset/second_estimate_of_gdp) or in their copyright statement on their website.

They don't do particularly well on the Technical front because of using Excel (I might have missed a publication route that uses SDMX or something easier to process). They do OK on the Social side, as they do have contact points for queries and so on.

The general thing I learn from this is that the certificate tells the truth about what ONS is good and bad at, from the perspective of someone reusing this data. But I think you're absolutely right that we need to keep the messaging positive about people managing to publish open data at all.

shevski commented 8 years ago

Badges now renamed