json-patch / json-patch2

A possibile revision to the JSON-Patch format
44 stars 0 forks source link

A better JSON Pointer #19

Closed WebReflection closed 6 years ago

WebReflection commented 6 years ago

Writing here as suggested by Mark Nottingham.


The RFC6901 specification looks very JSON-unfriendly, re-inventing arrays through escaped strings, easily error prone while manipulating any path in projects such just-diff.

A different proposal

A JSON pointer can be just an Array requiring zero special cases and granting type preservation. Being an Array, a JSON pointer can be used also as generic JavaScript path pointer including the ability to have a Symbol as part of the key.

Examples

Given the following structure:

{
  "biscuits": [
    { "name": "Digestive" },
    { "name": "Choco Leibniz" }
  ]
}

Instead of having /biscuits/1/name as path to point at "Choco Leibniz" string, the path could be ["biscuits", 1, "name"].

Since a path can be specified to use integers when it comes to Array access, it's now never ambiguous to understand the structure.

An object with numeric entries will have these represented as strings, an Array will have these represented as integers.

{
  "biscuits": {
    "5": { "name": "Choco Leibniz" },
    "2": { "name": "Digestive" }
  }
}

Accordingly, to point at "Choco Leibniz" we can now use ["biscuits", "5", "name"].

Edge cases

Empty keys are allowed via [""] and at any level as in ["", 2, "", "name"].

Last item

Using the convention that an integer, if negative, means from the Array length, it is possible to point at a generic last item via -1, as example via ["biscuits", -1].

If the length of the array is meant, as appending at the end, being length an array property the path could simply be ["biscuits", "length"].

Improvements

As typed array, JSON pointer would be a list/set that can contain either strings or integers.

As a wider solution, a JSON pointer could contain also JavaScript symbols which is a generic object key by all means, but this is out of current JSON Pointer scope/proposal.

Thanks in advance for any sort of outcome.

pbryan commented 6 years ago

In principle, I certainly like the idea of having a JSON structure to reference another. Any thoughts on how your proposed scheme could be reasonably encoded as a URI fragment identifier?

WebReflection commented 6 years ago

Interesting question. May I ask when/where this is needed since the whole JSON is not too friendly as URI fragment?

My best guess is that if it's so relevant, either we use exact same convention is in place now but integers are distinguished by a single dot as suffix, like 1.~prop~2~a~3. could be or I need to better understand use cases and real world examples to better think about it, thanks .

WebReflection commented 6 years ago

That was 1. ~ prop ~ 2 ~ a ~ 3.

pbryan commented 6 years ago

The fragment identifier is used to identify a JSON value in a JSON document referenced by a URI. It's a stated objective of the RFC, equivalent in purpose of (and inspired by) the XML Pointer specification. In a notable example, it's used in JSON Schema in "$ref" JSON references.

WebReflection commented 6 years ago

edited to reflect latest proposal which is IMO superior

Given the following document

{
  "foo": ["bar", "baz"],
  "": 0,
  "a/b": 1,
  "c%d": 2,
  "e^f": 3,
  "g|h": 4,
  "i\\j": 5,
  "k\"l": 6,
  " ": 7,
  "m~n": 8,
  "a.b": 9,
  "0": 10,
  "1": ["biz"],
  "'2'": 12
}

We could represent each access through the following notation:

#            // the whole document
#.foo        ["bar", "baz"]
#.foo.0      "bar"
#.           0
#.a/b        1
#.c%25d      2
#.e%5Ef      3
#.g%7Ch      4
#.i%5Cj      5
#.k%22l      6
#.%20        7
#.m~n        8
#.a%2Eb      9
#.'0'        10
#.'1'.0      "biz"
#.%272%27    12

The accessing semantic reflects common object properties access through a . dot. Whenever an index is a string and it looks like a number, it's wrap through ' single quote reflects the explicit semantic intent.

The choice of a sub-delims respects its RFC3986 purpose:

If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII.

Special chars in the key

Since all keys need to be uri-decoded, when special characters such . and ' are part of the key, a surely not so common case, their representation must be encoded as %2E for the dot and %27 for the single quote.

Numeric index VS string index

The common Array index accessor is preserved as integer. The following regular expression would match any array index, including negatives: ^-?[0-9]+$ Integers are preserved in encoding thanks to their type information (an index of an Array is numeric, a key in a hash/object is string)/

In the not so common case a key is represented by a plain number and matches the previous regular expression, it must be wrapped in single quotes so that it'll look like '1' or even '-1'.

Pros

Most common JSON cases will be easier to read and write.

.foo.bar.0.name

.foo.bar.-1.age

The semantic doesn't need much explanation since it's regular JavaScript notation with the exceptions for indexes but it's easy to reason about the choice.

Same applies for numeric keys

.foo.bar.-1.biz.'123'.role

There is no ambiguity or specific order of encoding/decoding to remember so this proposal is also less error prone for humans.

Cons

A human reader/writer needs to remember both %2E and %27 have special meaning.

However, this is in my opinion easier than remembering what's ~0, what's ~1, and in which order these must be encoded or decoded.

Thoughts ?

WebReflection commented 6 years ago

Alternative

Instead of opening closing parenthesis, we could consider using just ' single quote.

Whenever an object key is numeric, it's access would be better described by the quote

.foo.bar.0.name VS .foo.bar.'1'.name

and the amount of characters to remember would be just 2, but not conflicting with each other.

Same benefits, less to remember.

Accordingly, given the following document

{
  "foo": ["bar", "baz"],
  "": 0,
  "0": 1,
  "'": 2
}

we can access all properties as such:

#            // the whole document
#.foo        ["bar", "baz"]
#.foo.0      "bar"
#.           0
#.'0'        1
#.%27        2

I actually prefer this version over the previous one.

WebReflection commented 6 years ago

FYI I have implemented a 100% code-covered version of the latest proposal: https://github.com/WebReflection/json-pointer

<!doctype html>
<script src="https://unpkg.com/better-json-pointer@0.0.0/min.js"></script>
<script>
var encoded = JSON.Pointer.encode(['a', 0, '1', "'b.'"]);
console.log(encoded);
// ".a.0.'1'.%27b%2E%27"
console.log(JSON.Pointer.decode(encoded));
// ["a", 0, "1", "'b.'"]
</script>
WebReflection commented 6 years ago

is there any progress on this?

mnot commented 6 years ago

@WebReflection - so this issues list isn't a decision-making body, it's just where we can discuss things about a potential future effort.

Your proposal looks reasonable, but if we wanted to incorporate it into a future format, we'd have to determine whether the benefits outweighed the costs of having Yet Another Way To Do This.

It seems like there's a decent amount of interest in this, but we won't know for sure until we decide to wrap things up and start an effort to define a new spec.

WebReflection commented 6 years ago

thanks for the update. I just go through my issues from time to time and this was still open, hence my question. I'll keep it open then. Regards

aalquist-akamai commented 6 years ago

@mnot - JSON patch has been very useful for me but I've been hitting limits with being able to apply generic patches across documents. I started introducing tools like http://jsonpath.com/ to make my patching more scalable. It's ability to filter and do recursive searching can make updating json objects in any document structure much easier.

WebReflection commented 6 years ago

nothing is happening here, so I remove this ticket from my list of issues.