Closed jviotti closed 4 weeks ago
@michaelmior I just added a real example based on properties
and additionalProperties
, but yes:
On evaluation path, you might start evaluating /properties
, which then takes you to /properties/foo/type
, which then takes you to /properties/bar/type
, etc. It is where you are in the evaluation process with regards to the schema.
In the current evaluator implementation, we directly keep the instance location and evaluation paths as pointers that we push to and pop from when executing every individual instruction:
Perfect! That clarifies things. Is it true that the actual value of annotations never need to be returned in the current implementation?
Correct. You never need to "retrieve" the annotation values, just check containment given the adjacent and non-adjacent algorithms. That said, we do pass references to the value at the time of emission to the evaluation callback to the user for them to hook on them if needed (for whatever they are doing or for reporting reasons). But internally, the evaluator doesn't need them.
hey @jviotti :)
I gave a first look at this document.
Here are my questions mark:
I struggle to understand why do we have two pointers here. Not why we pass two pointers, because I guess that it is due consumer constraints. But rather, why should we store them for annotations. The way that I understand it is that evaluate_path
is just an extension of instance_location
. Then I wonder if we could not have a path splited as tree.
Or maybe do you prefer to use a Map because its way more easier to implement?
Regarding the defines_adjacent
and defines
APIs. When you said "from keywords bar or baz": Is that mean that const Set<String> &keywords
contains keyword that have a "OR" relationship? What if we want a "AND" relationship?
@tony-go
I struggle to understand why do we have two pointers here. Not why we pass two pointers, because I guess that it is due consumer constraints. But rather, why should we store them for annotations. The way that I understand it is that evaluate_path is just an extension of instance_location. Then I wonder if we could not have a path splited as tree. Or maybe do you prefer to use a Map because its way more easier to implement?
They are not related to each other. Each represent an orthogonal path that might change on its own. For example, consider this schema and instance based on the one I showed above:
// Schema
{
"properties": {
"foo": { "type": "string" }
},
"additionalProperties": {
"type": "integer"
}
}
// Instance
{ "foo": "bar", "one": 1, "two": 2 }
The evaluation with regards to those pointers goes like this :
""
(the root of the schema) to instance location ""
(the root of the instance)"/properties"
to instance location ""
, as we evaluate properties
against the top level object"/properties/foo/type"
to instance location "/foo"
, as we evaluate the foo subschema against foo"/additionalProperties"
to instance location "/one"
, the first additional property"/additionalProperties"
to instance location "/bar"
, the second additional propertyAs you see, each moves pretty much independently and you cannot consider one as an extension of the other.
I suggested a map above as a theoretical representation, but if we can do a tree, that sounds good too. Whatever satisfies those requirements.
Regarding the defines_adjacent and defines APIs. When you said "from keywords bar or baz": Is that mean that const Set
&keywords contains keyword that have a "OR" relationship? What if we want a "AND" relationship?
It is an OR. I'll clarify!
@tony-go @michaelmior One thing that could help with allocations is the observation that for the evaluation path, its JSON Pointer tokens almost always correspond to JSON Schema keywords (i.e. /propreties/foo/items/type
). In that way, the tokens are not completely arbitrary and repeat quite often. For example, most schemas use properties
and type
.
One interesting thing we could do is have a variant of the sourcemeta::jsontoolkit::Pointer
class (or extend that one?) so that you can instantiate it based on a "pool" of potential property tokens, so that instead of allocating the strings every time, we reference them from the pool?
i.e. imagine you can construct the pointer with a second argument that is the pool or something like that
so that you can instantiate it based on a "pool" of potential property tokens
I think that makes sense. I suspect that you'll have a lot of repetition even among tokens that aren't keywords, so the pool approach makes sense to me even without pre-populating the pool.
@jviotti
As you see, each moves pretty much independently and you cannot consider one as an extension of the other.
Yeah, way more clear now! Also I wonder if we could not rename them to:
shcema_location
/ instance_location
schema_pointer
/ instance_pointer
I think it would be less confusing if it was like that. WDYT?
It is an OR. I'll clarify!
❤️
I think it would be less confusing if it was like that. WDYT?
We shouldn't be inventing our own names. instance_location
and evaluate_path
are the right terms defined by the JSON Schema specification itself.
@michaelmior
Perfect! That clarifies things. Is it true that the actual value of annotations never need to be returned in the current implementation?
Actually, I'm realising there is one case where we do need this: prefixItems
emits an index of the last array position covered by it, which items
needs to get. So I guess you do need to get all of the annotations produced for a certain location after all
Closing this, given the work we are doing to make most of the compiler annotation agnostic
In JSON Schema, annotations are JSON values associated to a given instance location (a JSON Pointer) and evaluation path (a JSON Pointer). A container to hold annotations, in theory could be something like this:
Note that there can be multiple JSON values associated with a given instance location and evaluation path, but duplicates are not taken into account.
Write Operations
Insertion: You need to be able to insert a JSON document into a given instance location and evaluation path. The challenge here is to avoid copying the given pointers and the JSON value.
Removal: You need to be able to remove all annotations produced at a given evaluation path or its children, independently of its instance location. For example, if you want to remove annotations at evaluate path
/properties/not
, you also remove any annotation produced at evaluate path/properties/not/foo
and/properties/not/foo/bar/baz
.Read Operations
Adjacent containment: You need to be able to check if a given set of keywords (OR) emitted the given annotation value for a given instance location, sibling to the given evaluation path.
For example, you might be looking for the annotation value
"foo"
at instance location/myObject
, from keywordsbar
orbaz
, sibling to evaluation path/one/two/three
. That means the following combinations are valid:/myObject
//one/two/bar
/"foo"
/myObject
//one/two/baz
/"foo"
General containment: You need to be able to check if a given set of keywords (OR) emitted the given annotation value for a given instance location, sibling to the given evaluation path or on any of its children
For example, you might be looking for the annotation value
"foo"
at instance location/myObject
, from keywordsbar
orbaz
, sibling to evaluation path/one/two/three
. That means all the following combinations are valid:/myObject
//one/two/bar
/"foo"
/myObject
//one/two/baz
/"foo"
/myObject
//one/two/whatever/a/b/c/bar
/"foo"
Real Example
Consider the following schema. If the instance is an object, if it defines the property
foo
, the latter must be a string. If it defines any other property that is notfoo
, that property must be an integer.Now consider the following instance:
Relying on annotations, the evaluation goes as follows:
properties
. Thefoo
property is defined and it validates against the given subschema""
(instance location, the root of the document) /"/properties"
(evaluation path) /"foo"
(value)additionalProperties
""
(instance location) /"/properties"
OR""
(instance location) /"/patternProperties"