openownership / lib-cove-bods

Check that your data complies with the Beneficial Ownership Data Standard (BODS) using our install our data review library to analyse files via your command line interface
https://datareview.openownership.org/
Other
1 stars 0 forks source link

Unhelpful validation error message #64

Closed kd-ods closed 2 years ago

kd-ods commented 3 years ago

(I hope this is the right repo for this issue.) Try running this json through the data review tool:

[ { "annotations": [ { "motivation": "correcting" } ], "addresses": [ { "country": "EE", "type": "residence" } ], "birthDate": "1985-02", "identifiers": [ { "id": "280871-*****", "schemeName": "GTR-PASSPORT" } ], "isComponent": false, "names": [ { "familyName": "Nonymou", "fullName": "A Nonymous", "givenName": "A", "type": "individual" } ], "nationalities": [ { "code": "GB" } ], "personType": "knownPerson", "publicationDetails": { "bodsVersion": "0.2", "license": "http://creativecommons.org/publicdomain/zero/1.0/", "publicationDate": "2020-11-10", "publisher": { "name": "Test Register", "url": "http://www.test.com" } }, "source": { "assertedBy": [ { "name": "Me" } ], "type": [ "selfDeclaration", "officialRegister" ] }, "statementDate": "2020-10-30", "statementID": "d5764567474-d2a5-4e89-be41-614crgdtgsdgsgsfg", "statementType": "personStatement" } ]

It throws the error: '{'motivation': 'correcting'} is not valid under any of the given schemas'. This does not help with troubleshooting. A better message would be:

'{'motivation': 'correcting'} is not valid according to the BODS schema. See https://github.com/openownership/data-standard/tree/master/schema.

Or if we could link to the relevant place in the BODS schema that would be even better. (In this case:

https://github.com/openownership/data-standard/blob/master/schema/components.json#L394, OR

https://github.com/openownership/data-standard/blob/7693b43c7d506106196bd83c7cd440e57236341d/schema/components.json#L329)

kd-ods commented 3 years ago

To clarify the above. Because of the json schema logic, the user is told "{'motivation': 'correcting'} is not valid under any of the given schemas", rather than the more useful information: statementPointerTarget is a required property of an annotation.

So, if this isn't fixable with a change to the json schema logic, then pointing to the relevant point in the schema is the next best option.

kd-ods commented 2 years ago

Just noticed that the existing schema logic also causes/relates to this issue: https://github.com/openownership/data-standard/issues/333

odscjames commented 2 years ago

This is caused by use of "anyOf" - if the data does not match the JSON Schema validator doesn't know which one of the onyOf clauses to use, hence the unhelpful error message. If we could remove the "anyOf" that would be a simple fix, but having just had a quick look I'm not sure we can. We fixed a similar problem with regards to better errors with the 3 statement types, but this might be harder to solve as for that one we could use the "type" key to choose the correct schema and this time there is no such key.

odscjames commented 2 years ago

I'm looking into our custom validator. I've managed to extend it to provide better errors - it will try to select the subschema that is the right one to use based on the selector field then show the user errors, so the user sees "field is required" errors.

Working in isolation at https://gist.github.com/odscjames/3e510bebdaa736b53ad2b80f96d6e43d but when I try and put it into Cove I get problems - I'll keep looking.

Also noting that we could remove the anyOf totally here if we instead added a simple Python check - if an annotation is of type linking then url is a required field.

odscjames commented 2 years ago

this time there is no such key.

So I was wrong about that. There is such a key - the motivation key. I was able to reuse the same mechanism to pick which sub schema to use, and then we get the direct errors from the subschema like:

oneOfErrors

https://github.com/OpenDataServices/lib-cove/pull/96 does this, however I've tried to do it in a general way so that we can activate this feature in schemas in the future, and not have to change libraries.

This involves some changes to the BODS schema that I'm hoping we can release as patch releases to existing 0.2 and maybe 0.1.

(If this isn't possible there are other ways possible)

So if that is fine, the process is:

So I'm waiting for feedback on whether changes to standard are ok? @kd-ods

kd-ods commented 2 years ago

My overall question is: can we focus on making these changes in the main branch of the schema first, and - for the moment - leave consideration of whether to issue a 0.21 patch?

I'm also not clear on what 'release' means above and how lib-cove-bods changes need to be timed with an actual BODS release.

Also - the new key "oneOfEnumSelectorField": "motivation": is there anything like this in the latest version of JSON schema, or are we going out on a limb?

odscjames commented 2 years ago

can we focus on making these changes in the main branch of the schema first

Yes

leave consideration of whether to issue a 0.21 patch?

(Aside: Does this mean 0.2.1?)

I'm also not clear on what 'release' means above

Release a new version of the software library / standard with a new version number. In the case of the software libraries we do that all the time so it's not a big deal, but I want to talk to you about the options for the standard.

how lib-cove-bods changes need to be timed with an actual BODS release.

All the parts need to be done before this starts working but they don't need to be released together in any way - we can release in any order at any time and that shouldn't cause any issues.

the new key "oneOfEnumSelectorField": "motivation": is there anything like this in the latest version of JSON schema, or are we going out on a limb?

This is new and not JSON Schema standard. But we already have non-standard things like codelists, titles in $ref's, ... - that's part of what https://compiletojsonschema.readthedocs.io/en/latest/ does, turn it back into a standard JSON Schema.

kd-ods commented 2 years ago

Thanks for all that, James....

(Aside: Does this mean 0.2.1?)

Yes.

So, I think what I'd say is: check with @rhiaro tomorrow about making those changes on the main branch of the schema. You might want to work on @rhiaro's branch to coordinate the changes?

rhiaro commented 2 years ago

I don't like using schema properties to provide config options to the validator, which oneOfEnumSelectorField is doing. It's not Data, as in it isn't something someone publishing BODS would fill in a value for. It would be out of place to show up in the schema reference or schema browser documentation. It's not something an end user of the data - either publisher or consumer - should have to think about.

If there's a better way to pass this information to the validator (specifically cove, as any other hypothetical BODS validator may do things a different way) we should look at that instead of adding to the schema in this case.

odscjames commented 2 years ago

Discussed with @kd-ods this morning - we'll put in schema for now.

odscjames commented 2 years ago

This is not released yet. 2 things need to happen for this to be released:

Discussed with @kd-ods this morning: given the size of problem we are happy to wait for these and thus I am now closing this.