json5 / json5-spec

The JSON5 Data Interchange Format
https://spec.json5.org
MIT License
49 stars 11 forks source link

Define object name duplicate behavior and require implementations support behavior `unique` #38

Open zamicol opened 2 years ago

zamicol commented 2 years ago

Proposal

Even though JSON RFC 8259 states that "names within an object SHOULD be unique", it leaves duplicate object name behavior undefined. The RFC warns that only objects with unique names are guaranteed interoperability since "all software implementations receiving that object will agree on the name-value mappings", but it does not prohibit duplicate names.

I propose JSON5 explicitly defines object name duplicate behavior while guaranteeing full compatibility with JSON. Explicitly defining behavior increases system interoperability, removes the potential for bugs, and provides less surprises to users.

JSON5 should define the following four object name duplicate behaviors:

unique requires JSON5 implementations to fail on duplicate object names. last-value-wins requires JSON5 implementations to deduplicate and only report the last name/value pair. duplicate requires JSON5 implementations to permit duplicates and preserves name/value pairs. undefined permits JSON5 implementations to handle duplicate behavior in any way.

Further, JSON5 should

This proposal makes the distinction between applications and implementations. This proposal suggests that JSON5 implementations MUST define their behaviors and MUST support unique, but applications may behave however they like. The behavior of particular applications may be noted in documentation, conveyed by API's, or simply not documented at all. This proposal is not suggesting that particular applications or API's must support unique behavior. Also, this proposal is not suggesting standardizing a method for application behavior selection.

Related thoughts

Many JSON authority figures have expressed their desire for unique names. It's reasonable for JSON5, which makes minor improvements to JSON, to take this opportunity to implement this hope.

The behavior of Crockford's Java JSON implementation is to error on duplicate object names. Although not JSON5, Crockford's implementation already complies to this proposal since it supports unique.

Also, Crockford suggested modifying the JSON RFC to require unique object names, although it was decided it was too late to do so:

The names within an object SHOULD be unique. If a key is duplicated, a parser SHOULD reject. If it does not reject, it MUST take only the last of the duplicated key pairs.

Disallowing duplicates conforms to the small I-JSON RFC. The author of I-JSON, Tim Bray, is also the author of JSON RFC 8259

There's also security problems and interoperability problems with duplicates. See the article, "An Exploration of JSON Interoperability Vulnerabilities"

jordanbtucker commented 2 years ago

Thank you for this well thought-out, well written proposal. Would you mind elaborating on the difference between implementation and application. Those aren't terms already described in the spec, so they may need to be defined if we include them.

On a side note, I just rediscovered that ES5 strict mode forbids duplicate property names.

zamicol commented 2 years ago

Jordan, you've always done fantastic work on JSON5. Thank you for reading this proposal.

Perhaps JSON5 doesn't need to make the distinction? Instead, implementations MUST support unique, and SHOULD support last-value-wins, duplicate, and undefined, and leave it at that.

Looking at ES5, looks like the default behavior is last-value-wins.

jordanbtucker commented 2 years ago

Thanks for the clarification. What is the value in defining an undefined behavior?

You're correct that the default behavior in ES5 is last-value-wins, unless it's in strict mode, then it's unique.

nocturn9x commented 2 years ago

I don't think having the spec allow for undefined behavior is a great idea. The point of a specification is to lay down a standard, and especially in this case having an UB is not only undesirable, but also probably useless.

Just my 2 cents though

zamicol commented 2 years ago

Sorry, my response was poorly written. This is better:

Implementations MUST support duplicate behavior unique. Implementations MAY support additional duplicate behaviors.

Perhaps "non-standard" is a better word here than "undefined". If a JSON5 implementation selects a duplicate behavior that is not unique,last-value-wins, or duplicate, it is said to be non-standard.

An implementation that uses a non-standard behavior, such as duplicate_behavior:random-wins, needs to be permissible by JSON5 to be JSON compatible, but JSON5 itself doesn't need to define "non-standard" behaviors.

You're correct that the default behavior in ES5 is last-value-wins, unless it's in strict mode, then it's unique.

What section is that mentioned in?

zamicol commented 2 years ago

Any more thoughts?

I suspect the next step is to draft an example by adding a few sentences about duplicate behavior to the spec section 3 Objects. Having a concrete spec to critique would be helpful.

jordanbtucker commented 2 years ago

You're correct that the default behavior in ES5 is last-value-wins, unless it's in strict mode, then it's unique.

What section is that mentioned in?

https://262.ecma-international.org/5.1/#sec-C

It is a SyntaxError if strict mode code contains an ObjectLiteral with more than one definition of any data property (11.1.5).


Implementations MUST support duplicate behavior unique.

JSON5 is facing the same issue TC39 had with Crockford's similar suggestion. There already exists JSON5 documents and JSON5 implementations that behave based on the current spec, which uses SHOULD.

The clause "Implementations MUST support duplicate behavior unique." is a breaking change for JSON5 implementations. Any implementation that was developed against the current spec and did not include unique support, including the reference implementation, would immediately become non-compliant.

zamicol commented 2 years ago

https://262.ecma-international.org/5.1/#sec-C

Ah! Thank you!

become non-compliant

Totally, and that's a great concern.

We've seen a "change" like this before:

The JSON RFC 7159 allowed UTF-8, UTF-16, or UTF-32.

The JSON RFC 8259 requires only UTF-8.

This decision was made after surveying implementations and noting that all popular implementations used UTF-8.

In the same way, we can survey implementations and see if any depend on other behaviors. For example, my guess would be that Javascript and Go implementations have behavior last-value-wins and Java would be unique.

I'm not sure how much consideration implementers have given into duplicate behavior. It may be that they want behavior unique, but have implemented whatever was the most convenient in their language of choice.

It would be interesting to get a temperature reading of the community's feelings on the issue. How would be the best way to get a response? A survey? Opening up a Github issue on the various implementations?

Is there a list of JSON5 implementations? That too might be a great place to start.

An alternative to forcing a specific behavior in the spec would be the spec could simply define behaviors, unique, last-value-wins, duplicate, and non-standard, suggest implementations SHOULD use unique, and leave it at that. Implementations should explicitly document what behavior is the default.


As a different matter, should I open up an issue in the reference implementation to support behavior unique? Alternatively, I can do a pull request just to document it for now.

jordanbtucker commented 2 years ago

Is there a list of JSON5 implementations? That too might be a great place to start.

Yes, there is a community maintained list at In the Wild.

As a different matter, should I open up an issue in the reference implementation to support behavior unique? Alternatively, I can do a pull request just to document it for now.

Yes, that would be a welcome issue / PR. The reference implementation uses last-value-wins because that's what ES5 does, and I used the parsing steps from that spec to write the code. I don't think it's the best behavior, and I'm happy to include a unique option, which is off by default in this version, but perhaps on by default in v3.

zamicol commented 2 years ago

It is a SyntaxError if strict mode code contains an ObjectLiteral with more than one definition of any data property (11.1.5).

Perhaps this changed with ES6? Do you know of an example that will trigger SyntaxError on duplicates?

I suspected something like this would throw SyntaxError. Instead it results in last-value-wins. Looking at the ES6 spec, I can't find reference to duplicate behavior.

tc = {'bob':'bob','bob':'bob2'};
console.log(tc); // prints `{bob: 'bob2'}`
jordanbtucker commented 2 years ago

It looks like they removed that restriction in ES6 with the introduction of computed property names. Since the following code would initialize an object with duplicate property names, using last-value-wins, with no way for the compiler to detect the duplicate names, ECMA decided to remove the restriction.

const key1 = 'foo'
const key2 = 'foo'

const object = {
  [key1]: 'bar',
  [key2]: 'baz',
}

console.log(object) // { foo: 'baz' }

See also https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#duplicate_property_names