Open lwjohnst86 opened 1 week ago
The frictionless
R package has a nice design, along with some checks, e.g.: https://github.com/frictionlessdata/frictionless-r/blob/main/R/check_package.R
Could be used for inspiration.
For metadata, the default behaviour of jsonschema
is to run all checks at once. We could customise this either by modifying the schema (e.g. extracting a sub-schema from it) or using the whole schema and filtering out errors we want to ignore.
As for whether there are scenarios where a partial check would be useful/necessary, I think one candidate is when we are building the metadata and it is not yet complete (e.g. before adding resources to a package). Here we presumably want to disable required checks for ?some? fields?
Just for me to keep track of questions:
path
field for resources.NotPropertiesError
but (1) add an errors
attribute to hold error objects (useful for components built on top of core
) and (2) add a human-readable summary message?datapackage.json
, regardless of what is being updated, we always write the whole properties structure, so running one check function makes sense.datapackage.json
is correct/complete? Maybe we actually want to allow users to correct bad/incomplete metadata. In which case we don't want to fail if metadata is incomplete or otherwise incorrect. We only want to make sure that we're getting JSON from datapackage.json
. As for extra/custom fields coming from datapackage.json
, the standard says those are allowed, so we don't want to fail for those either.ValidationError
), which is nice for components using cli
ValidationError
, so we would need to have a further layer of abstraction on top of the error classes; as the DP standard has its own required fields, we still wouldn't have a clear split between required-checks and well-formed-checksjsonschema
's validate
, then we can collect all errors easily and all errors will be the same type. 🎉 NotPropertiesError
can expect ValidationError
s and generate a summary of them for the error message.jsonschema
's custom validator), it's straightforward how to check blank values correctly (i.e. no extra code needed to establish the type of a required field and the corresponding blank value).I'll try to respond to as many of these questions as I can:
I agree with you, that having a single check function that runs against their schema via jsonschema
makes the most sense. Rather than artificially splitting it.
I think on load there should be a check as well, since it is entirely possible that at some point, someone makes a change directly in the file. So this starts me thinking, can we instead of throwing an error, give messages of incorrectly written fields as warnings during the computation/checks and maybe some suggestion on how to fix it? If we can't always guarantee that it will be correct. Like say "this is a warning for now, but could cause problems later, please fix now with ...".
An input from the user should definitely be checked, to give warnings/errors before the problem continues. But again, maybe it should be only warnings for now and we can decide how we handle specific cases of errors.
Yes, I definitely think we need our own checks. Internally they would be two, but externally, they would be bundled together. I feel like, but could be wrong, that adding our own checks back to the schema would get complicated and doing simple checks for things as we go. But I can't envision how this all looks very well right now.
Could we not use ValidationError
for our own custom checks?
I think I like the "collect them all and then inform the user" rather than "user runs, gets error/warning, has to fix it, runs again, gets error, etc etc".
Yes!
I don't have enough understanding of this particular thing to have a strong opinion. We can always implement something and as we test it/go along, see how it works.
For instance, run against everything in the schema at once, or specific things, etc?