json5 / json5-spec

The JSON5 Data Interchange Format
https://spec.json5.org
MIT License
49 stars 11 forks source link

JSON5SingleStringCharacter: Paragraph separator not allowed in JSON5, but in RFC8259 #22

Closed Kijewski closed 4 years ago

Kijewski commented 4 years ago

RFC 8259 allows unescaped line separators (U+2028) and paragraph separators (U+2029) in strings:

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

JSON5 forbids unescaped line and paragraph separators in strings:

JSON5DoubleStringCharacter:: SourceCharacter but not one of " or \ or LineTerminator

Where ECMAScript 5.1 states:

Line Terminator Characters: \u000A, \u000D, \u2028, \u2029

This means there is valid JSON data, that is invalid in JSON5.

The easiest fix would be to only disallow unescaped U+000A and U+000D characters; and allow U+2028 and U+2029.

Compare: https://github.com/Kijewski/pyjson5/commit/e29648073b723f69bb6da6059e767e63ad40e3c3/checks?check_suite_id=357322911

jordanbtucker commented 4 years ago

Please see 5.2 Paragraph and Line Separators.

Kijewski commented 4 years ago

IMO JSON5(Single|Double)StringCharacter should link to this section instead of ECMAScript's documentation then, to make it obvious that ECMAScript's documentation is in fact not the relevant spec for unescaped line and paragraph separators in string literals.

jordanbtucker commented 4 years ago

The JSON5DoubleStringCharacter production clearly lists U+2028 and U+2029 as matching characters, and section 5.2 reenforces this, so I'm not sure where the confusion is.

Kijewski commented 4 years ago

The link:

<a href="https://www.ecma-international.org/ecma-262/5.1/#sec-7.3">LineTerminator</a>
jordanbtucker commented 4 years ago

Yes, LineTerminator includes U+2028 and U+2029, which means the first rule in JSON5DoubleStringCharacter does not match those characters. That is why that production explicitly includes those characters later. They are matched by a later rule in that production.