open-contracting / standard

Documentation of the Open Contracting Data Standard (OCDS)
http://standard.open-contracting.org/
Other
138 stars 46 forks source link

Consider a "strict" schema to introduce JSON Schema validation keywords prior to 2.0 #1046

Open jpmckinney opened 4 years ago

jpmckinney commented 4 years ago

Several issues deal with incorrect or missing JSON Schema validation keywords. See milestone: https://github.com/open-contracting/standard/milestone/25

Typically, we would need to wait for a major version (2.0) to make changes that invalidate data from earlier versions (i.e. backwards-incompatible changes).

However, we can offer a voluntary, "strict" schema that adds in these keywords, removes deprecated fields and codes, removes deprecated types, etc.

Users of the Data Review Tool could voluntarily select the strict schema. Similarly, OCDS Helpdesk analysts could inform publishers of opportunities to improve their data by passing the stricter criteria.

duncandewhurst commented 2 years ago

However, we can offer a voluntary, "strict" schema that adds in these keywords, removes deprecated fields and codes, removes deprecated types, etc.

It might be preferable to keep deprecated fields, codes, and types. If we remove them, then the DRT will report them as additional in which case it's less clear what to do about them than if they are flagged as deprecated in which case the deprecation notes usually explain what to do instead.

jpmckinney commented 2 years ago

Yes, we can keep the deprecated fields and codes.

The DRT (and schema) has nothing about deprecated types (e.g. "number" for ID fields), so it is better to remove them and cause a structural error.

duncandewhurst commented 2 years ago

Sounds good!

duncandewhurst commented 2 years ago

Are there any other types that we should remove, apart from "number" for ID fields and "null" for required fields?

jpmckinney commented 2 years ago

Not that I'm aware. I think most fields are single-type, but we can run some code to find any other multi-type.

duncandewhurst commented 2 years ago

I checked using a Python script and the only other multi-type fields are:

duncandewhurst commented 6 months ago

Noting that #1480 doesn't close this issue because we still need to prepare a second PR to add the documentation discussed in #1480. Once the docs are ready, we can run the manage.py command to generate the strict files, and then merge.

jpmckinney commented 6 months ago

prepare a second PR to add the documentation discussed in https://github.com/open-contracting/standard/pull/1480.

Copying relevant content here for easier reference:

From https://github.com/open-contracting/standard/pull/1480#issuecomment-1053793225

Do you have a view on if and how we should feature the strict schema(s) in the documentation?

I figure few users would do anything with the actual JSON schema, but adding schema viewers for the strict versions of the release, release package and record package schemas seems overly duplicative. I think we should include some mention of it in the docs if we are going to include it in the DRT and helpdesk feedback processes. We could either add an admonition to https://standard.open-contracting.org/staging/1.2-dev/en/schema/# or a new page under the reference section which can give a summary of the differences to the regular schema and direct users to the DRT to check their data against the strict schema.

https://github.com/open-contracting/standard/pull/1480#issuecomment-1071242289

Do you have a view on if and how we should feature the strict schema(s) in the documentation?

I think a concise admonition as you suggest would be best. It should briefly describe why the strict schema is preferred for new implementations (and how it can be useful to improve data quality of existing implementations).

In terms of duplication, we can consider using https://sphinx-design.readthedocs.io/en/latest/ to allow browsing of each schema. That way it's not taking up a lot of the page.

https://github.com/open-contracting/standard/pull/1480#issuecomment-1071818457

In terms of duplication, we can consider using https://sphinx-design.readthedocs.io/en/latest/ to allow browsing of each schema. That way it's not taking up a lot of the page.

I tried this approach but only the first docson widget is displayed. The second widget throws an Uncaught Error: only one instance of babel-polyfill is allowed (see lbovet/docson#73).

In other projects that have multiple docson widgets per page we use https://github.com/OpenDataServices/docson which is a fork of an earlier version of docson that doesn't use babel-polyfill, but I see that https://github.com/open-contracting/docson makes several improvements that would be lost if we used the other fork.

I found https://www.npmjs.com/package/idempotent-babel-polyfill/ which might be a solution but my Javascript knowledge falls short of how to update docson to use it.