sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

SG15 proposal for implementations that present source code to conform with UAX9-HL4 #75

Open tahonermann opened 2 years ago

tahonermann commented 2 years ago

As described in L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, Visual Studio conforms to UAX9-HL4; it implements a higher level protocol that splits source text into segments such that a character direction change that occur within a token does not effect presentation of the following token.

For example, consider the following expression:

x + y == 1

Both x and y have left-to-right directionality. If y is replaced with a right-to-left character such as U+05EA HEBREW LETTER TAV, then presentation changes such that the weakly directional characters that come after it are displayed right-to-left thus producing a confusing display (the following is expected to present as x + 1 == <TAV> with <TAV> replaced by the actual Hebrew letter assuming GitHub hasn't implemented something like what Visual Studio has).

x + ת == 1

To counteract this, Visual Studio behaves as though a U+200E LEFT-TO-RIGHT MARK is inserted after each token. This restores the default directionality expected for source files and presents the code in a way that matches its semantics:

x + ת‎ == 1

This is a useful behavior worthy of being implemented in any tool that displays source code, including compilers that present fragments of source files in diagnostics. See https://godbolt.org/z/MM1xE5dM1 for an example of the opportunities for improvement (note that the carat in the diagnostic is not aligned with the identifier it is intended to indicate).

See also issue #74.