Introduction
Since #103 we have not had static json schema files available, and we are validating the jsons internally, with dynamically generated files (see here). We did this because it is a lot of effort to manually write these JSON schemas and to keep updating it every time we make changes to this python package.
However, I think it is still good to have a versioned JSON schema available somewhere, for FAIRness; mainly interoperability with other software such as existing json validators and this thing.
Proposal
Since we already have a lot of the necessary infrastructure in place, my proposal is as follows:
Make this python package the official source for the generative metadata format
Align the version of the json schema with the version of this package
With every tagged release, we automatically generate and push the core json schema to the GMF repository using a github action. I've already done a little test that works here; see the artifact that is produced.
This aligns well with our recent changes, adding a cli (#142) and the inclusion of a docker container (#150).
Changes
Changes are mainly in the validation script (using __version__ for the schema base). We could include generating the schema in the CLI, which would make the github action simpler and allow us to (re)generate previous schema versions using our versioned docker containers. The most work will be in creating the github action to push to a different repo (which we already do for our website so that's definitely possible)
The GMF repository will be kind of "broken", as the versions will change. I think this is okay.
Out of scope
We should think about how to include / deal with plugins at a later date.
A relevant package we might want to look in relation to this is pydantic which does a lot of nice data validation stuff and automatically builds jsonschema.
Introduction Since #103 we have not had static json schema files available, and we are validating the jsons internally, with dynamically generated files (see here). We did this because it is a lot of effort to manually write these JSON schemas and to keep updating it every time we make changes to this python package.
However, I think it is still good to have a versioned JSON schema available somewhere, for FAIRness; mainly interoperability with other software such as existing json validators and this thing.
Proposal Since we already have a lot of the necessary infrastructure in place, my proposal is as follows:
This aligns well with our recent changes, adding a cli (#142) and the inclusion of a docker container (#150).
Changes Changes are mainly in the validation script (using
__version__
for the schema base). We could include generating the schema in the CLI, which would make the github action simpler and allow us to (re)generate previous schema versions using our versioned docker containers. The most work will be in creating the github action to push to a different repo (which we already do for our website so that's definitely possible)The GMF repository will be kind of "broken", as the versions will change. I think this is okay.
Out of scope We should think about how to include / deal with plugins at a later date.