Closed joe-warren-permutive closed 6 months ago
I've checked a range of different JSON printers
Firefox, does escape LS
and PS
(and treats them differently to other characters in the same block)
> JSON.stringify("\u2026")
'"…"'
> JSON.stringify("\u2027")
'"‧"'
> JSON.stringify("\u2028")
'"\u2028"'
> JSON.stringify("\u2029")
'"\u2029"'
> JSON.stringify("\u202a")
'""'
However, jq, Circe, NodeJS, and Chrome, print those values unescaped.
I don't think that's a compelling argument for or against.
I'd be happy to make a PR for the fix myself, but this would be my first time contributing to Aeson, so I'm keen to figure out if it would be accepted before starting work.
This Issue from 2015 heavilly implies that "only escaping values required by the spec" is a deliberate design decision.
I'm leaning away from PRing this, as it breaks the current property that strings are encoded cannonically, according to RFC-8785, and any changes to the string escaping would require copying the current logic into Data.Aeson.RFC8785
.
Any choice here would be arbitrary, as you note that different implementations do different things. There is some reasoning behind the current choice, and I'm sure many are also (implicitly) depending on the current behavior.
The Unicode standard, chapter 5.8, lists 7 different types of newline character.
The string escaping code in
Data.Aeson.Text.hs
appears to escape 4 out of 7 of these characters: the characters that are not escaped are NEL (x0085
), LS (x2028
) and PS (x2029
).I've encountered at least one parser that treats these values as a newline, and will therefore fail when encountering them unescaped in a Json string.
I'd like to suggest updating the escaping logic, so that these characters would be escaped.
RFC-8259 section 7 is fairly clear that these do not need to be escaped:
However, it also states:
I'd suggest that escaping all newline characters would be more robust.