Open G4Vi opened 2 months ago
Such clarifications need to be decided upstream
Given the corrigendum, an exception or conversion to the replacement character is certainly incorrect. Warning behavior would be up to preference. Several other tools handle this incorrectly as well.
JSON:PP does not warn for either case nor does it do any replacement when decoding non-chars. Should we still ask them? I wouldn't mind stripping out the non-char warnings to match JSON:PP.
I would be in favor to match JSON::PP, even if Unicode recommends otherwise.
Decoding an escaped non-character works as expected:
and produces a warning as described at https://metacpan.org/pod/Cpanel::JSON::XS#6.-Unicode-noncharacters-only-warn,-as-in-core.:
Decoding a regular non-character does not:
No warning is produced.
Neither is a warning is produced when UTF-8 decoding is also done:
I'm undecided on whether it should warn when decoding without UTF-8 decoding as on one hand, these characters should have been discovered when they entered a Perl string (and may have warned already), but on the other hand it means during JSON decoding, escaped and non-escaped non-chars are handled differently. However, when UTF-8 decoding is done during JSON decoding it seems pretty clear they should be handled the same.
Under https://metacpan.org/pod/Cpanel::JSON::XS#JSON-and-ECMAscript is this paragraph outdated? "Unicode non-characters between U+FFFD and U+10FFFF are decoded either to the recommended U+FFFD REPLACEMENT CHARACTER (see Unicode PR
#
121: Recommended Practice for Replacement Characters), or in the binary or relaxed mode left as is, keeping the illegal non-characters as before." In my testing I never saw the non-characters be converted to the replacement character (nor do I think they should be).I'm happy to make a PR after it's clarified how this should be handled.