theodi / open-data-certificate

The mark of quality and trust for open data
https://certificates.theodi.org/
MIT License
46 stars 39 forks source link

Add reference to "scoping" information #7

Closed ldodds closed 11 years ago

ldodds commented 11 years ago

Another useful facet that could be added to the certificate is pointers to "scope" or "coverage" information that describes the content of a dataset.

Geographical datasets might have a regional scope.

Datasets containing events or time-related information might have specific dates that apply to the data. E.g. historical data points or current events

Understanding the scope/coverage of a dataset will help both technical and non-technical users understand how it can be applied. I'd suggest this was a Silver level requirement. Scope information might typically be covered in a summary of a dataset, but could be more detailed.

JeniT commented 11 years ago

I absolutely think this is important.

But I think there's a question about what level of metadata to have within the certificate, and what should be present within the page(s) that document the dataset. There are common metadata fields like title, creator, scope, coverage, subject, which we'd hope to have within the documentation page for the dataset.

Instead of adding those metadata fields into the certificate questions, perhaps we should have a set of checkboxes (or similar) to indicate whether or not the documentation includes information about each of those aspects? When we eventually get automatic population of the questionnaire, those checkboxes can be filled in automatically.

ldodds commented 11 years ago

Having a check list of things to include in the documentation sounds like a reasonable approach to me.

I understand that the certificate tool isn't intended to cover all items of metadata, but it will have a strong feedback effect on guiding data publishers towards doing the Right Thing when publishing data. A check-list of things to document will work just as well.

JeniT commented 11 years ago

Added (in v0.3) questions about published metadata for the dataset and for distributions (in 'Documentation' under 'Social', if you've provided a documentation URL). The fields are taken from DCAT and I'm not sure they cover everything (for example there's no link to a copyright statement, only directly to a licence). Also not sure about the levels for each of the fields.

ldodds commented 11 years ago

These look good.

I think title, description, publisher and release date are all Pilot requirements. No excuse for not saying when something was published and by whom.

The rest feel mostly standard level requirements. E.g. information on frequency of releases, pointers to distribution should be consistent with answers to "How do you publisher datasets in the series". Frequency information and lists are standard requirements there.

The only argument for making more metadata "examplar" is if we think its unnecessary, or relatively unimportant for most typical uses.

JeniT commented 11 years ago

There's also an argument for making some metadata completely optional (ie there in the checklist but not related to any level) if we think that it won't apply to some kinds of data. That's certainly the case when you get to the distribution-level metadata (as not all distributions are downloads).

Another possibility is to turn on/off some of the requirements based on previous answers in the questionnaire.