Open tahonermann opened 2 years ago
I updated the issue title to extend this issue to cover the inclusion of all of the following characters in whitespace. This would suffice for C++ to meet the Pattern_White_Space
requirements of UAX31-R3.
Additionally, inclusion of the ALM should be considered as it is conceptually similar to LRM and RLM, though it is not a member of the Pattern_White_Space
property (and cannot be added because that property is immutable). Including this character in whitespace would require the specification of a profile in [uaxid.pattern] for conformance with UAX31-R3.
Unicode paper L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, adopted for the upcoming Unicode 15 release, demonstrates the utility in allowing U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) to appear in whitespace, but not to constitute whitespace in isolation. The intent is to allow these marks to be inserted in whitespace in order to restore character directionality that might have been altered by characters in the preceding token.