sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

SG15 proposal for implementations that present source code to conform with UAX9-HL4 #75

Open tahonermann opened 2 years ago

tahonermann commented 2 years ago

As described in L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, Visual Studio conforms to UAX9-HL4; it implements a higher level protocol that splits source text into segments such that a character direction change that occur within a token does not effect presentation of the following token.

For example, consider the following expression:

x + y == 1

Both x and y have left-to-right directionality. If y is replaced with a right-to-left character such as U+05EA HEBREW LETTER TAV, then presentation changes such that the weakly directional characters that come after it are displayed right-to-left thus producing a confusing display (the following is expected to present as x + 1 == <TAV> with <TAV> replaced by the actual Hebrew letter assuming GitHub hasn't implemented something like what Visual Studio has).

x + ת == 1

To counteract this, Visual Studio behaves as though a U+200E LEFT-TO-RIGHT MARK is inserted after each token. This restores the default directionality expected for source files and presents the code in a way that matches its semantics:

x + ת‎ == 1

This is a useful behavior worthy of being implemented in any tool that displays source code, including compilers that present fragments of source files in diagnostics. See for an example of the opportunities for improvement (note that the carat in the diagnostic is not aligned with the identifier it is intended to indicate).

See also issue #74.