tobyink / p5-json-path

2 stars 12 forks source link

Indeterministic result ordering for some selectors #10

Open tobyink opened 3 years ago

tobyink commented 3 years ago

Migrated from rt.cpan.org #130488 (status was 'open')

Requestors:

From christoph.burgmer@gmail.com on 2019-09-10 21:38:27 :

For the given selectors and documents I can reproduce differently ordered results on re-runs:

Exhibit A

Selector: $[*] Document: { "some": "string", "int": 42, "object": { "key": "value" }, "array": [0, 1] }

Outcome #1: [ 42, [ 0, 1 ], "string", { "key": "value" } ]

Outcome #2: [ { "key": "value" }, 42, [ 0, 1 ], "string" ]

Exhibit B

Selector: $.store..price Document: See https://cburgmer.github.io/json-path-comparison/results/recursive_on_nested_object.html

Outcome #1: [ 19.95, 8.95, 12.99, 8.99, 22.99 ]

Outcome #2: [ 8.95, 12.99, 8.99, 22.99, 19.95 ]

Expected:

Deterministic ordering to reduce flakiness in output.

Rationale:

This is a blocker for this implementation to be added to the comparison of aforementioned project, see https://github.com/cburgmer/json-path-comparison/issues/4 .

I believe the nondeterminism comes from Perl's hashes not guaranteeing order. It's debatable however whether JSONPath should even guarantee order, see e.g. https://github.com/cburgmer/json-path-comparison/issues/3 . I'm happy to find other means for now to introduce a deterministic response but am limited by my knowledge of Perl. Happy for pointers. Looked at Tie::IxHash but can't figure out how to bring it in after JSON decoding.

tobyink commented 3 years ago

From popefelix@gmail.com on 2019-09-10 22:10:54 :

You are correct - the nondeterminism does come from Perl's hashes not guaranteeing order.

How do other JSONPath implementations handle this? Is there a spec that dictates this?

On Tue, Sep 10, 2019 at 4:38 PM Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

Tue Sep 10 17:38:27 2019: Request 130488 was acted upon. Transaction: Ticket created by christoph.burgmer@gmail.com Queue: JSON-Path Subject: Indeterministic result ordering for some selectors Broken in: 0.420 Severity: Normal Owner: popefelix Requestors: christoph.burgmer@gmail.com Status: new Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

For the given selectors and documents I can reproduce differently ordered results on re-runs:

Exhibit A

Selector: $[*] Document: { "some": "string", "int": 42, "object": { "key": "value" }, "array": [0, 1] }

Outcome #1: [ 42, [ 0, 1 ], "string", { "key": "value" } ]

Outcome #2: [ { "key": "value" }, 42, [ 0, 1 ], "string" ]

Exhibit B

Selector: $.store..price Document: See https://cburgmer.github.io/json-path-comparison/results/recursive_on_nested_object.html

Outcome #1: [ 19.95, 8.95, 12.99, 8.99, 22.99 ]

Outcome #2: [ 8.95, 12.99, 8.99, 22.99, 19.95 ]

Expected:

Deterministic ordering to reduce flakiness in output.

Rationale:

This is a blocker for this implementation to be added to the comparison of aforementioned project, see https://github.com/cburgmer/json-path-comparison/issues/4 .

I believe the nondeterminism comes from Perl's hashes not guaranteeing order. It's debatable however whether JSONPath should even guarantee order, see e.g. https://github.com/cburgmer/json-path-comparison/issues/3 . I'm happy to find other means for now to introduce a deterministic response but am limited by my knowledge of Perl. Happy for pointers. Looked at Tie::IxHash but can't figure out how to bring it in after JSON decoding.

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.

tobyink commented 3 years ago

From christoph.burgmer@gmail.com on 2019-09-11 07:08:25 :

With all the other implementations I've tried (Java, PHP, Python, Ruby, ...) they at least provide a stable response (possibly due to the underlying guarantees of the language). Compare e.g. row "Wildcard dot notation on array" in the table under https://cburgmer.github.io/json-path-comparison/.

The JSONPath "spec" doesn't seem to say anything about this, whereas for the actual JSON spec I've read conflicting interpretations - many discussions claim that order is not guaranteed, but I have to yet find this in any spec.

What I've done with Python when I wanted to maintain order is use an OrderedDict: https://github.com/cburgmer/json-path-comparison/blob/master/src/oneliner_json.py

tobyink commented 3 years ago

From popefelix@gmail.com on 2019-09-11 14:14:56 :

So the JSON spec says that objects are unordered. In your Exhibit A, you're passing in an object, so there's no guarantee of how things will be ordered coming out. Exhibit B is similar - you'll note that 19.95 is either the first or the last element in the list, but never in the middle.

I'd be willing to work on a "canonical" flag similar to the flag used by Storable (https://metacpan.org/pod/Storable#CANONICAL-REPRESENTATION). Would that be sufficient?

On Wed, Sep 11, 2019 at 2:08 AM Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

With all the other implementations I've tried (Java, PHP, Python, Ruby, ...) they at least provide a stable response (possibly due to the underlying guarantees of the language). Compare e.g. row "Wildcard dot notation on array" in the table under https://cburgmer.github.io/json-path-comparison/.

The JSONPath "spec" doesn't seem to say anything about this, whereas for the actual JSON spec I've read conflicting interpretations - many discussions claim that order is not guaranteed, but I have to yet find this in any spec.

What I've done with Python when I wanted to maintain order is use an OrderedDict: https://github.com/cburgmer/json-path-comparison/blob/master/src/oneliner_json.py

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.

tobyink commented 3 years ago

From christoph.burgmer@gmail.com on 2019-09-12 15:24:55 :

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9), as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

tobyink commented 3 years ago

From christoph.burgmer@gmail.com on 2019-09-12 15:38:36 :

Just for reference, I've tried looking for a source on the much quoted JSON standard as it came up in other discussions as well, and so far only found

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

I've mentioned this in https://github.com/cburgmer/json-path-comparison/issues/3 where I'd try to focus the general discussion on ordering for JSONPath, but wanted to call this out here once.

Am Do., 12. Sept. 2019 um 17:24 Uhr schrieb Christoph Burgmer < christoph.burgmer@gmail.com>:

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9), as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

tobyink commented 3 years ago

From popefelix@gmail.com on 2019-09-12 16:01:13 :

"Just for reference, I've tried looking for a source on the much quoted JSON standard"

Ah, that's an easy one! :)

RFC 7159 (http://www.rfc-editor.org/rfc/rfc7159.txt) states "An object is an unordered collection of zero or more name/value pairs" (emphasis mine)

On Thu, Sep 12, 2019, 10:38 Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Just for reference, I've tried looking for a source on the much quoted JSON standard as it came up in other discussions as well, and so far only found

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

I've mentioned this in https://github.com/cburgmer/json-path-comparison/issues/3 where I'd try to focus the general discussion on ordering for JSONPath, but wanted to call this out here once.

Am Do., 12. Sept. 2019 um 17:24 Uhr schrieb Christoph Burgmer < christoph.burgmer@gmail.com>:

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see

https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9 ), as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

tobyink commented 3 years ago

From popefelix@gmail.com on 2019-09-12 16:54:15 :

Sorry - I quoted an obsolete RFC. The correct one is RFC8259 ( https://datatracker.ietf.org/doc/rfc8259/?include_text=1), which again defines an object to be "an unordered collection of zero or more name/value pairs".

To your question above of what constitutes "good" JSONPath, I would say that good JSONPath produces results that a reasonable person would expect having read Goessner's definition and RFC 8159. Thus, to put the question in the context of this issue, if a JSONPath implementation imposes an order on the keys of an object, that implementation is out of compliance with the spec, consensus or no.

On Thu, Sep 12, 2019 at 11:00 AM Kit Peters popefelix@gmail.com wrote:

"Just for reference, I've tried looking for a source on the much quoted JSON standard"

Ah, that's an easy one! :)

RFC 7159 (http://www.rfc-editor.org/rfc/rfc7159.txt) states "An object is an unordered collection of zero or more name/value pairs" (emphasis mine)

On Thu, Sep 12, 2019, 10:38 Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Just for reference, I've tried looking for a source on the much quoted JSON standard as it came up in other discussions as well, and so far only found

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

I've mentioned this in https://github.com/cburgmer/json-path-comparison/issues/3 where I'd try to focus the general discussion on ordering for JSONPath, but wanted to call this out here once.

Am Do., 12. Sept. 2019 um 17:24 Uhr schrieb Christoph Burgmer < christoph.burgmer@gmail.com>:

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see

https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9 ), as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.

tobyink commented 3 years ago

From christoph.burgmer@gmail.com on 2019-09-12 18:52:46 :

Fair point, but let's also call out

JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

It seems some people have seen the need to somewhat pass on the order.

Anyhow, let me know what you think is reasonable for your library. Feel free to give the json-path-comparison project a run, it's meant to help, not enforce :)

Am Do., 12. Sept. 2019 um 18:54 Uhr schrieb Kit Peters via RT < bug-JSON-Path@rt.cpan.org>:

<URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Sorry - I quoted an obsolete RFC. The correct one is RFC8259 ( https://datatracker.ietf.org/doc/rfc8259/?include_text=1), which again defines an object to be "an unordered collection of zero or more name/value pairs".

To your question above of what constitutes "good" JSONPath, I would say that good JSONPath produces results that a reasonable person would expect having read Goessner's definition and RFC 8159. Thus, to put the question in the context of this issue, if a JSONPath implementation imposes an order on the keys of an object, that implementation is out of compliance with the spec, consensus or no.

On Thu, Sep 12, 2019 at 11:00 AM Kit Peters popefelix@gmail.com wrote:

"Just for reference, I've tried looking for a source on the much quoted JSON standard"

Ah, that's an easy one! :)

RFC 7159 (http://www.rfc-editor.org/rfc/rfc7159.txt) states "An object is an unordered collection of zero or more name/value pairs" (emphasis mine)

On Thu, Sep 12, 2019, 10:38 Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Just for reference, I've tried looking for a source on the much quoted JSON standard as it came up in other discussions as well, and so far only found

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

I've mentioned this in https://github.com/cburgmer/json-path-comparison/issues/3 where I'd try to focus the general discussion on ordering for JSONPath, but wanted to call this out here once.

Am Do., 12. Sept. 2019 um 17:24 Uhr schrieb Christoph Burgmer < christoph.burgmer@gmail.com>:

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see

https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9 ),

as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.

tobyink commented 3 years ago

From popefelix@gmail.com on 2019-09-12 18:58:25 :

You make a fair point as well. :)

I think that the best way to move forward is either for you to update your tests to not care about object order, or for me to add the "canonical" flag I mentioned above. I would prefer that you updated your tests, because that's less work on my part, but I'm OK with doing the work if need be. Or we could split the difference and I could advise you on adding the flag yourself. :D

Up to you.

KP

On Thu, Sep 12, 2019 at 1:52 PM Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Fair point, but let's also call out

JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

It seems some people have seen the need to somewhat pass on the order.

Anyhow, let me know what you think is reasonable for your library. Feel free to give the json-path-comparison project a run, it's meant to help, not enforce :)

Am Do., 12. Sept. 2019 um 18:54 Uhr schrieb Kit Peters via RT < bug-JSON-Path@rt.cpan.org>:

<URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Sorry - I quoted an obsolete RFC. The correct one is RFC8259 ( https://datatracker.ietf.org/doc/rfc8259/?include_text=1), which again defines an object to be "an unordered collection of zero or more name/value pairs".

To your question above of what constitutes "good" JSONPath, I would say that good JSONPath produces results that a reasonable person would expect having read Goessner's definition and RFC 8159. Thus, to put the question in the context of this issue, if a JSONPath implementation imposes an order on the keys of an object, that implementation is out of compliance with the spec, consensus or no.

On Thu, Sep 12, 2019 at 11:00 AM Kit Peters popefelix@gmail.com wrote:

"Just for reference, I've tried looking for a source on the much quoted JSON standard"

Ah, that's an easy one! :)

RFC 7159 (http://www.rfc-editor.org/rfc/rfc7159.txt) states "An object is an unordered collection of zero or more name/value pairs" (emphasis mine)

On Thu, Sep 12, 2019, 10:38 Christoph Burgmer via RT < bug-JSON-Path@rt.cpan.org> wrote:

   Queue: JSON-Path

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=130488 >

Just for reference, I've tried looking for a source on the much quoted JSON standard as it came up in other discussions as well, and so far only found

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

I've mentioned this in https://github.com/cburgmer/json-path-comparison/issues/3 where I'd try to focus the general discussion on ordering for JSONPath, but wanted to call this out here once.

Am Do., 12. Sept. 2019 um 17:24 Uhr schrieb Christoph Burgmer < christoph.burgmer@gmail.com>:

I needed some time to think on this.

What's currently making the outcome hard to compare is that a query like $[*] on an object doesn't yield a stable representation. If there is a, say, simple way of achieving that, it would help, yes. The ordering has to already be preserved when parsing the JSON document though (see

https://github.com/cburgmer/json-path-comparison/blob/Perl_JSON-Path/implementations/Perl_JSON-Path/main.pl#L9

),

as this already introduces indeterminism in Perl I believe (as decode_json returns a hash I believe). Python for example does allow that.

Kit, my goal with the comparison project is to seek clarification on what "good" JSONPath is (including I guess when a certain order can be guaranteed), but I'm trying to keep my on personal opinion out of it, hence the whole approach of comparing via consensus.

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.

-- Kit Peters, W0KEH GPG public key fingerpint: D4FF AA62 AFEA 83D6 CC98 ACE5 6FAE 7E74 7F56 ED1D Hello to any and all NSA, DEA, or other government or non-government agents reading this email. Tell me about your life; I'll tell you about mine.