sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

SG15 proposal for implementations that present source code to highlight invisible characters and right-to-left text #76

Open tahonermann opened 2 years ago

tahonermann commented 2 years ago

The Unicode Source Code Working Group (SCWG) has been discussing how tools (IDEs, editors, compilers, etc...) might call attention to the presence of invisible characters that might have a semantic effect or that might affect how source code is presented (potentially in a confusing, misleading, or malicious manner). The group has not yet issued any recommendations or guidelines; this issue is being filed to track the concern and call attention to it.

Concerns are particularly notable for source code that contains right-to-left (RTL) characters (including characters such as U+200F RIGHT-TO-LEFT MARK (RLM), U+202B RIGHT-TO-LEFT EMBEDDING (RLE), U+202E RIGHT-TO-LEFT OVERRIDE (RLO), and U+202C POP DIRECTIONAL FORMATTING (PDF)). The presence of these characters may affect the directional rendering of adjacent characters. This can be particularly confusing when these characters appear adjacent to placeholders in text formatting strings (e.g., printf, std::format, etc...) since their presence may result in the placeholders being rendered in a right-to-left or mixed order.

Possible remedies for presentation of such characters is to annotate them (e.g., to visually display [RLE] for a U+202B character) and/or to suppress their semantic effect and/or to visually depict their semantic effect (e.g., to visually display a directional arrow beneath the affected text anchored to the otherwise invisible character).

Examples illustrating motivation are expected to be produced by the SCWG in future papers or Unicode releases.