Open tahonermann opened 3 years ago
Actually, the standard does supply a list of whitespace characters in [lex.pptoken]p2:
... Preprocessing tokens can be separated by whitespace; this consists of comments ([lex.comment]), or whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. ...
and again in [lex.token]p1:
... Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “whitespace”), as described below, are ignored except as they serve to separate tokens.
[Note 1: Some whitespace is required to separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters. — end note]
Note that 'new-line' there is already a term of art. It possibly includes various combinations of
On Tue, Mar 23, 2021 at 4:14 PM Tom Honermann @.***> wrote:
Actually, the standard does supply a list of whitespace characters in [lex.pptoken]p2 http://eel.is/c++draft/lex#pptoken-2:
... Preprocessing tokens can be separated by whitespace; this consists of comments ([lex.comment]), or whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. ...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sg16-unicode/sg16/issues/69#issuecomment-805215615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVNZ5UUC4SBXUC5D4SFHEDTFDZCFANCNFSM4ZV4FG4Q .
Later revisions of P2295 no longer address this.
P2348 - of which an early draft is there https://isocpp.org/files/papers/D2348R0.pdf rewords the handling of whitspaces and new lines without extending the set
This issue was discussed on the Unicode.org mailing list. There was a recommendation from a Unicode expert that, for programming languages, Pattern_White_Space
may be a useful starting point, but that it might make sense to drop the U+200E and U+200F bidirectional markers and add U+3000 (IDEOGRAPHIC SPACE).
The total feedback was a single response, though.
The C++ standard defines behavior that depends on whether a character constitutes white-space, but never defines what those characters are. Uses of the "whitespace" and "white-space" terms appear in:
P2178 proposal 2 sought to clarify the set of characters that constitute white-space and proposed the following set. These characters all satisfy the immutable
Pattern_White_Space
property (see UAX #44 and/or search forPattern_White_Space
in the UCD).The above set of characters excludes the following characters that satisfy the (not immutable)
White_Space
property (see UAX #44 and/or search forWhite_Space
in the UCD).When addressing this issue, we may want to take the opportunity to replace the existing "whitespace" and "white-space" terminology with "blank space"; ISO guidance may require such a renaming in the future.