JSON Schema, fundamentally - expression based assertions

hrennau commented 4 years ago

Issue 823 "Precision and Scale for Decimal Values" illustrates a fundamental question - which kinds of constraints are general enough to warrant inclusion in the standard vocabulary? In cases of doubt, there is always a dilemma - keep the constraint language small and clear, but lose expressiveness, or gain expressiveness and lose clarity and coherence.

Other validation languages have dealt with this dilemma by supporting expression based assertions, which are expressions (or something close to that) defined in some kind of expression language. Expressions enable the schema author to define constraints in a free way, beyond any fixed vocabulary. For example, XSD version 1.1 added XPath assertions, and the specification of SHACL (RDF validation language) has two parts - one defining the core vocabulary and the other one adding support for assertions defined by SPARQL queries. The rationale behind such an approach could not be described more clearly than by this paragraph from the SHACL specification, which opens "Part 2: SHACL-SPARQL":

Part 1 of this specification introduced features that are built into the Core of SHACL. The goal of this Core is to provide a high-level vocabulary for common use cases to describe shapes. However, SHACL also provides mechanisms to go beyond the Core vocabulary and represent constraints with greater flexibility. These mechanisms, called SHACL-SPARQL, are described in the following sections.

- https://www.w3.org/TR/shacl/#sparql-constraints (paragraph immediately preceding this anchor)

I am aware of the fundamental character of such a step as adding support for expression based assertions. We are not talking of an issue, but of considerations of strategic scope. I raise the topic here, however, because I would love to hear about the current state of considerations, goals and non-goals, plans etc. touching upon possible support for expression based assertions.

gregsdennis commented 4 years ago

Looking at the example given in the link you provided, these kinds of assertions can already be achieved with current JSON Schema assertions.

SELECT $this (ex:germanLabel AS ?path) ?value
WHERE {
    $this ex:germanLabel ?value .
    FILTER (!isLiteral(?value) || !langMatches(lang(?value), "de"))
}

If I'm reading this right, it says the value of ex:germanLabel needs to have a de language tag. I expect this structure to be implemented in JSON in one of two ways.

String literal

If the whole value "Spain"@en is a string literal, the it would be something like {"ex:germanLabel":"\"Spain\"@en"}. This is cumbersome and probably not the right way to do it, but it can be handled quite easily with the regular expression .*"@de$, denoting that the string for germanLabel must end in "@de. For this, use the pattern keyword.

Object

The better way to do represent this would be to have an object:

{
  "ex:germalLabel": {
    "value": "Spain",
    "lang": "en"
  }
}

This can be handled with various other assertions that already exist, the simplest would be a const keyword.

{
  "properties": {
    "ex:germanLabel": {
      "type": "object",
      "properties": {
        "value": { "type": "string" },
        "lang": { "const": "de" }
      }
    }
  }
}

Another difficulty here is defining the available list of functions that are available, for example langMatches in the FILTER expression. This would have to be implemented by the library. Arbitrary functions could not be used. It may be possible to have user-defined functions, but then we'd be turning JSON Schema into a scripting language, which is contrary to its current declarative style.

hrennau commented 4 years ago

Greg, the point is not whether or not a particular constraint can be expressed using the current vocabulary, but whether or not a mechanism is desired for expressing constraints beyond the limits of a predefined vocabulary. (Remember ticket json-schema-org/json-schema-vocabularies#8.) In particular, such support is required if what is commonly called "business rules" should be expressible.

I am also aware that the keywords "if", "then", "else" enable in some cases constraints which might be called business rules, but I suspect that the means to do this - expressing a condition in terms of validity against a "test constraint" - will quickly result in convoluted, unnatural constructs.

The issue you raised concerning functions points to the basic question: what expression language to support? (I think supported functions are an aspect of the expression language.) Perhaps a key problem is that dealing with JSON, there is no obvious candidate for a standardized expression language? The spec might also define a generic mechanism, treating the actual language as implementation-defined. (Here again, much can be learned from SHACL.) But let me mention that XPath 3.1 would be a candidate, standardized in a process taking more than a decade and capable of dealing with JSON. Concerning JSON Path, a question to answer would be if the maturity and degree of standardization are sufficient.

I understand your apprehension about turning JSON Schema into a scripting language, but this need not be the case. A crucial aspect is the understanding of the concept "expression". XSD 1.1 or SHACL are certainly not scripting languages, although they allow the schema author to use expressions or queries which might be non-trivial.

Yes - in how far "declarativeness" is diminuished by expressions is not an easy question. And it is certainly a valid position to reject expression based assertions, for reasons related to simplicity and purity. That is why I raised the topic - I want to learn about the prevailing attitudes. Thank you for responding.

gregsdennis commented 4 years ago

but I suspect that the means to do this - expressing a condition in terms of validity against a "test constraint" - will quickly result in convoluted, unnatural constructs.

It does. I was against the inclusion of if-then-else, and @handrews was reluctant to add it. anyOf/oneOf are better constructs for these applications, but the masses wanted something that was more "readable." I'm not sure that goal was attained.

what expression language to support?

Going alongside this, JSON Schema is supposed to be representable in JSON. This means that any expressions would have to be embedded in strings if it's in another language.

If you can build an expression with JSON, then I suggest that authoring an "expressions" vocabulary with an associated meta-schema would be appropriate. Then, any library (or extension) that understand such a vocabulary would know how to evaluate them.

handrews commented 4 years ago

@hrennau

In particular, such support is required if what is commonly called "business rules" should be expressible.

In general, we have a fairly long history of saying that business rules are not intended to be expressible in JSON Schema. Look at issues tagged with $data (which was an old keyword proposal, but the tag kind of turned into a generic "issues about business logic" tag) for more discussion.

JSON Schema validation is intended to validate the structure of JSON. It is not intended to validate application semantics. The format keyword sort-of went in that direction, but you'll note that in the latest draft, by default format MUST NOT be validated (you can ask an implementation to validate it in various ways, but it's not reliable- it never has been, I don't think anyone ever fully implemented the keyword.

I'd personally like to dump it altogether and encourage 3rd-parties to write targeted vocabularies for specific cases (e.g. date-time formats are easily validated, email addresses are not, and there is no good reason they should be handled by the same keyword).

I don't object to people finding solutions for specific business cases, but I would be against writing a general expression system into JSON Schema. It does not need to be all things to all people, and most problems with it have come from trying to shove it into use cases that it doesn't actually solve.

(also paging @awwright )

hrennau commented 4 years ago

Greg,

interesting points! Indeed - there are two approaches how to represent expressions, opaquely, as an embedded string - or as a tree of keywords. XSD 1.1 - embedded strings; SHACL - both: (a) SPARQL strings (W3C Recommendation), (b) "node expressions" (W3C Note, "Advanced Features").

I am struck by the correspondence of your suggestion with what has been defined here:

https://www.w3.org/TR/shacl-af/ Namely the sections:

5 SHACL Functions
6 Node expressions
7 Expression Constraints

Lo and behold, "node expressions" match your idea of an "expressions" vocabulary. Interestingly, there are only seven "keywords"!

And also note that SHACL functions are defined by the spec (W3C Note) in terms of a signature, while the implementation language is left implementation-defined:

The actual execution logic (or algorithm) of a SHACL function can be declared in a variety of execution languages. This document defines one specific kind of SHACL functions, the SPARQL-based functions. JavaScript-based Functions are defined in the separate SHACL-JS document [shacl-js]. The same function IRI can potentially be executed on a multitude of platforms, if it declares execution instructions for these platforms. ( From: https://www.w3.org/TR/shacl-af/#functions )

handrews commented 4 years ago

I'd be fine with someone working up an expressions vocabulary. That's what vocabularies are for. My objection is only to putting it into Core or Validation. If you can make it work as a vocabulary then please do!

handrews commented 4 years ago

(we should probably make a better-named tag than $data, but I can't be bothered right now so I'm tagging this one with it- you can't just rename them or you break everyone's links)

handrews commented 4 years ago

@hrennau see also @awwright 's guidance on the scope of JSON Schema validation .

hrennau commented 4 years ago

@handrews Thank you, Henry, for your views, for pointing out the relationship to $data tagged discussions and sharing the link. I strongly sympathize with your caution - it is so important to keep the core clean. Therefore your point of view - "write a vocabulary!" - appears to me sound.

Much thinking required, though. I am against using the term "application logic" as an easy argument for keeping something out of scope. I think we are touching upon key concepts related to validation, and we should be aware of them as such and have clear positions, in order to be quite sure what should be expected from the core, and what not. (Not necessarily in the official version One, though.)

Namely, we are dealing with the concept of mapping - reasoning about the validity of a value by constraints applied to the result of mapping the value to another value. Your $data initiative did this in very pure (thus at the same time very restricted and very general) form: the relative JSON pointer is minimal navigation, and navigation is a specialized form of mapping.

So my concern is that the distinctions made and the decisions taken are led by sound intuition, yet somewhat lack a rock-solid foundation, carved from a deliberate combination of acceptance and refusal of options existing on the conceptual plain. What is the relationship of JSON Schema to the mapping of values? I wonder whether the core should not already be "aware" of the possibilities, so that vocabularies to come move within a conceptual framework set by the core.

handrews commented 4 years ago

@hrennau See json-schema-org/json-schema-spec#855 for my current thoughts on this (not the project's official thoughts- I just filed that, no one else has weighed in yet.

It covers why $data is not a suitable solution, and while it definitely does not define any sort of expression system, it outlines how a schema keyword MAY interact with instance data.

In theory, I think you could construct fairly complex expression keywords as an extension vocabulary. But it would definitely be an extension and not the current Core or Validation specifications. JSON Schema has proven useful for nearly a decade without an expression system, so I do not see the need to burden the existing specifications with such a complex extension. Vocabularies are the appropriate place to push the boundaries of JSON Schema's use cases.

For an example of a complex data-loading keyword, take a look at JSON Hyper-Schema's links keyword, specifically the href keyword within the link object which loads instance data into an RFC 6570 URI Template, either through the simple approach defined in RFC 6570 or a more complex adjusted mapping using hrefSchema.

shishirvk commented 4 years ago

hello @handrews , @hrennau

I have been trying get the following schema work :

{ "title": "Sample Schema", "type": "object", "properties": { "age": { "title": "Age Criteria", "type": "object", "properties": { "minAge": { "title": "Minimum Age", "type": "integer", "minimum": 18, "maximum": 100, "default": 18 }, "maxAge": { "title": "Maximum Age", "type": "integer", "minimum": {"$data":"1/minAge"}, "maximum": 100, "default": 18
} }

}
}

}

It will be great help if you or anyone on this thread can tell me :

1) Is the $data keyword available as of today to use in jsonschemas ? 2) If not, then are there any other alternatives ?

Relequestual commented 4 years ago

$data is not part of the specification. There are currently not any alternatives. Sorry.

We're working on defining how extensions MAY do this here: https://github.com/json-schema-org/json-schema-spec/issues/855

There will be some uptick from releasing a 2019-09 patch (2020-0*) till implementations are ready, and till others start writing their own vocabularies.

shishirvk commented 4 years ago

Thank you !

handrews commented 4 years ago

I'm moving this to the vocabularies repo. I do not see a need for it in core, and I see a compelling need to keep this complexity out of core. It may or may not be doable as a vocabulary, but it should start there.

json-schema-org / json-schema-vocabularies

JSON Schema, fundamentally - expression based assertions #25

String literal

Object