json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.43k stars 251 forks source link

Relative JSON Pointer should specify evaluation relative to JSON Pointer #1235

Open handrews opened 2 years ago

handrews commented 2 years ago

Currently, the Relative JSON Pointer (RJP) specification only specifies how to evaluate an RJP with a document reference. This is actually challenging as typical in-memory representations of JSON documents do not have child-to-parent links.

[EDIT: See also #1236 for an alternative approach]

I am implementing this spec at the moment, and doing so in terms of resolving the RJP against a regular JSON Pointer, and then evaluating the JSON Pointer with added functionality to handle the use of # to get the property name or index number rather than the value. This is much easier to implement and a natural analog of resolving URI references against base URIs.

For RJPs that do not use the # feature, using the resolved pointer is identical to using a regular JSON Pointer.

Supporting the # feature involves making certain that there is a value at the location, and then returning the last component of the JSON Pointer.

If this approach is supported [EDIT: In addition to the current approach], then the RJP spec also needs to specify what happens if the JSON Pointer involves a - as an array index to which # is applied. - indicates one position beyond the end of the array in the instance. The JSON Pointer spec states:

Note that the use of the "-" character to index an array will always result in [a lack of concrete value] error condition because by definition it refers to a nonexistent array element. Thus, applications of JSON Pointer need to specify how that character is to be handled, if it is to be useful.

In this sense, RJP is an "application" of JSON Pointer. The interesting case here is if, after moving upwards (removing components from the base pointer per the initial number in the RJP), the last remaining component is -. In this case, applying # should produce the index of that hypothetical next array item:

Given:

The resolved JSON Pointer would be /array/-, to which we would apply the # operation, returning 3 (the length of the array, which is also the index of the hypothetical next value). This is a useful piece of information to be able to access.

Note that it does not matter that the full base JSON Pointer does not resolve to anything. Just that /array must resolve to an actual array, so that - has something to measure.

This brings up another difference in the relative-to-base-pointer approach, which is that the base pointer need not be valid against an instance. Only the resolved pointer (ignoring # which is not part of regular JSON Pointer) must either point to an actual location in the document, or must end in - applied to an actual array in the document as above.

This does mean that there are scenarios that are possible with this approach that are not possible if the base pointer must point to an actual location. We could restrict the base pointer to only ones that point to an actual location if we want to, but then this "give me the length of the array" feature is not possible.


As a side note, I have not found any standalone implementation of RJP in Python, and wonder if this is part of the reason why. It's hard to write a generic implementation of RJP as specified without essentially re-inventing JSON Pointer to track the initial location in the document. Otherwise you need a more complex data structure with parent links.


I am not sure what milestone to put on this. It is obviously bigger than a "patch" but it's not clear to me that RJP needs to be locked to the JSON Schema spec progress as a.) this is backwards-compatible, and b.) it has no impact on the JSON Schema metaschema.

handrews commented 2 years ago

Also, yes, I'm volunteering to write a PR for this if accepted. I'm also happy for someone else to write it if that is preferred.

notEthan commented 1 year ago

I've just been implementing RJP today. (because apparently I wasn't already sidetracked enough from actually finishing / releasing my 2020-12 implementation.) I also have it operating primarily on my representation of a JSON pointer, rather than a document. For operating on documents, I have a reference from each node to the root, not to its parent, so the prescribed RJP algorithm looking what the referenced value is within doesn't work so well. So I ascend from a starting pointer and I evaluate the resulting pointer at the end. I also have a mode that only takes a pointer and returns another pointer (or with #, its last token).

(note: I'll use 'token' to mean 'reference token' as described by JSON pointer.)

There's two github issues, but I see three related issues to address.

  1. Evaluation on a pointer (or its tokens) instead of ascending from a referenced value within a JSON document.

    This seems good to me, a natural way to implement RJP.

    The intermediate result - the resolved pointer - does not require a document and seems useful. In my implementation if you pass a pointer without a document to an RJP for evaluation, you get the resolved pointer as the result. This almost works fine, but # complicates things.

    I wasn't certain if you intended in this issue for RJP's evaluation on a pointer to have a specified result in the absence of a document. (I thought so at first but reading again not so much.) I'll explore it because it is of interest to me, but maybe I am getting off into the weeds a bit.

    Operating on a pointer without a document, resulting in a pointer, is no problem without #. With # it can return the last token, but that becomes a bit inconsistent where the token could apply to an array - when the last token is a string of digits, or as you introduce above, -. With a document, the result of # on an array is an integer; without, it could be a property name or an index, so a string seems safest. This is also affected by index manipulation, which if present would indicate an array index.

    • rjp 0# on pointer /foo must be a string.
    • rjp 0# on pointer /0: could be index or property name, only a string seems safe.
    • rjp 0+0# on pointer /0: index manipulation can only apply to an array index. an integer result makes more sense here. (though whether this is a valid rjp is ambiguous, #1175)

    Maybe # should just always return a string (at least operating on a pointer?) - if array indices encoded as strings are good enough for JSON pointer, good enough for RJP here. Or maybe this is just beyond what the specification wants to specify.

  2. An array-length result from a pointer's final - token with RJP #.

    I implemented this after reading these issues. Minor pain in the ass, having to partially evaluate up to the penultimate token, check the if the token is - and instance is array and whether # was used, before continuing. It's not terrible to implement - I don't know quite what the utility of it would be, but then I don't know hyperschema, which I think is the intended application.

    This requires (1) since the - must come from an initial pointer. This would only apply when operating on a document (though the idea of operating without a document is somewhat tentative, per my maybe-in-the-weeds exploration in (1)). A pointer containing - points outside the document, so this requires (3).

  3. A starting pointer that refers to a location outside the instance document.

    Though suggested in #1236 to be an alternative, this seems to require (1), and is itself prerequisite to (2). The current spec operates entirely on the current referenced value in a document, which does not initially exist in this case, so operating on a pointer per (1) rather than a document is required. And for (2) to apply, the starting pointer must contain a -, which by definition points outside the document. (This would not be true if an RJP could combine a # with its own JSON pointer hypothetically ending in -, but it cannot.)


In conclusion: I fully support (1). I am not opposed to (2), am not sure of its utility, but trust you that it has utility, probably in hyperschema, which is worth complicating the implementation (not enormously but significantly). (3) seems both natural to allow given (1), and required for (2).

This got somewhat longer than I intended. I think it's mostly relevant, except the no-document exploration in (1) can be discarded from consideration if it's out of scope.

gregsdennis commented 2 weeks ago

@handrews what is the action here? Does RJP need a new section/paragraph?