open-contracting / pelican-backend

Measures the quality of OCDS data
https://pelican-backend.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Coverage support for additional fields not defined in the release schema #30

Open jpmckinney opened 4 years ago

jpmckinney commented 4 years ago

Duncan: I want to see coverage of fields in extensions and additional fields (this is available in the views.field_counts table in kingfisher). Example use case for this: Lindsey wanted to know if the Australia federal data included any information on whether suppliers were aboriginal-owned, which would likely be provided using an additional field or extension, I was able to check this using views.field_counts but it would be good for program managers etc. to be able to check this directly in the Data Quality Tool. (+1: JM, CP)

James: The implementation of these checks will need to be a little different for non-schema fields (it presently stores pass/fail for each field in the schema, but for additional fields, it's impossible to fail until you've read all the inputs and know which fields are used; a different method would have to be used, where failures are inferred.)

Related: https://github.com/open-contracting/kingfisher-views/issues/29

James: Noting that the Kingfisher table stores aggregate results only. In the DQT, we store results per compiled release, so that it's possible to later build combined checks / filters like "covers both identifier.scheme and identifier.id".

jpmckinney commented 4 years ago

The earlier feedback also suggested having an indication of the number of additional fields (those not covered by OCDS or extensions) on the overview page.

Duncan: number of additional fields. (to get a high level sense of how standardised the data is). Could consider a breakdown of this by stage if others would find that useful? Yohanna: actually I want to see in the overview if there is or not extra fields not declared as extensions

jpmckinney commented 4 years ago

From GitLab:

Also, as I understand, the current check distinguishes between: whether the field is not set at all, and whether it is empty. It is also useful to distinguish (in check result metadata) whether it is null, because null has a special meaning in OCDS compared to empty ({}, [], "").