Open kathryn-ods opened 3 months ago
@radix0000 does it make sense to implement this test at this point in time? Because it's only invalid if the major values don't match one of the invalid values would need to include a statement with e.g. "1.0" and "0.4" as 1.0 doesn't exist yet would that be flagged up for not being a valid bods version as well as having inconsistent values?
@kd-ods you might be able to advise on the above now you're back
This is a special kind of check, since the outcome relates to how the whole dataset is processed. I think we should hold off implementing this. Pre- v1 things are having to be handled a little differently.
For future reference this is where I think we are and where we are going:
When it comes to the DRT 'choosing' which version of the schema to validate a dataset against. It looks at the first statement in the dataset and:
publicationDetails.bodsVersion
field, validates against BODS 0.1publicationDetails.bodsVersion
field with a valid BODS version, validates against itpublicationDetails.bodsVersion
field with an invalid BODS version, validates against the latest version of BODS@radix0000 - is that right? (We should document exactly what the process is.)
This check, that 'all Statements MUST have the same major version number.' is done as part of the initial parsing of the data.
publicationDetails.bodsVersion
field or (b) all statements have a publicationDetails.bodsVersion
field and all Statements have the same major version numberpublicationDetails.bodsVersion
field and some don't or (d) all statements have a publicationDetails.bodsVersion
field but not the same major version number.On fail: the dataset is not validated and the user gets an informative error message
On pass (case (a)): the dataset is validated against BODS 0.1
On pass (case (b)): the dataset is validated against the the latest MINOR.PATCH version release for the given MAJOR version number.
Having worked through all that.... maybe post BODS v1 we should actually do a complete overhaul of the DRT too. We could relegate work so far to a 'beta' version then clean everything up for a v1 of the DRT. Then direct pre BODS v1 users to the beta version of the tool and BODS v1 + users to the new release. Then we don't need to maintain any overly-complicated BODS version-handling.
@kd-ods Re DRT choosing a schema version, it is slightly more complicated that (because as well as not being present, the cases where bodsVersion isn't a string, or isn't in list of known versions need to be covered), but the main tweak I have introduced is that it detects whether it is record-based (i.e. if it has "recordDetails", "recordId" or "recordType" in the statement), and if so it doesn't use BODS 0.1 as the default, instead it uses the latest version (i.e. currently 0.4). Having these 2 categories record-based and non-record-based and having different defaults for each seems sensible to me (given how different they are) but let me know what you think. There is a question of what the best defaults are as well (e.g. out of 0.1, 0.2, and 0.3 what is the "most used" version and should we be using that as the default for non-record-based data?).
Ah, thanks @radix0000. So is this a correct summary of what happens atm?
The entire dataset is validated against a single schema version.
The schema version is selected based on the contents of the first Statement in the array.
If that first statement is 'record-based' the whole dataset is validated against bodsVersion
(if it is present and valid). If that field is not present and valid then validation is against BODS 0.4.
If that first statement is not record-based the whole dataset is validated against bodsVersion
(if it is present and valid). If that field is not present and valid then validation is against BODS 0.1.
(If so - that looks sensible to me.)
Currently in cove as inconsistent_schema_version_used this needs to be rewritten to allow for inconsistent minor versions.
Check: all Statements MUST have the same major version number.
On fail:
Error message: Statements have different major version numbers. Info message: Version number (bodsVersion): [VALUE], Version number (bodsVersion): [VALUE2]