sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

Open tahonermann opened 1 year ago

tahonermann commented 1 year ago

[format.string.escaped]p2.2 states:

For each code unit sequence X in S that either encodes a single character, is a shift sequence, or is a sequence of ill-formed code units, processing is in order as follows: What constitutes a "sequence of ill-formed code units" is not specified. That is fine for implementation-defined encodings, but a precise definition could be specified for UTF-8, UTF-16, and UTF-32.

Unicode PR-121 provides a definition for "entire ill-formed subsequence" that is a good candidate for how a "sequence of ill-formed code units" might be defined:

In these policy statements, "entire ill-formed subsequence" refers to all code units in the ill-formed subsequence up to but not including the start of the next well-formed code unit sequence.