ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.86k stars 2.4k forks source link

Proposal: Multiple Values In Escape Sequences #17385

Open exxjob opened 10 months ago

exxjob commented 10 months ago

Preferring escape sequences to UTF-8 in source is a common coding standard, one reason being security. Directionalities, dingbats, emojis, diacritics, logograms, notations, controls... shouldn't or can't be printed in source files in many contexts. Currently, successive UTF-8 codepoints in escape sequences looks like so:

const a = "\u{a1f3b}\u{a1f3c}\u{a1f3d}\u{a1f3e}\u{a1f3f}";

The proposal is to support multiple values in escape sequences with this syntax:

const b = "\u{a1f3b\ a1f3c\ a1f3d\ a1f3e\ a1f3f}";

This is easier and safer to read and write. Backslash delimits at the end of a codepoint. Also applies for #17376 if accepted. See comment https://github.com/ziglang/zig/issues/17376#issuecomment-1745072369

exxjob commented 7 months ago

To clear up ambiguities:

This proposal would also make it intuitive to handle Unicode grapheme clusters, ZWJ / VS15 / VS16 emojis, and other needs. Minutia can be changed around, but I think this is approximately the right way to go about it.

rohlem commented 3 months ago

Is there precedence for \ meaning multiple elements? I agree that it's an improvement, but just naively looking at this, I think simply delimiting elements via comma , (maybe which optional space after it) would look even more readable to me personally. (I assume a parser already has to enter a special state to implement this, so I don't think giving special meaning to , in this context would affect performance much, if that was the reason.)

exxjob commented 3 months ago

@rohlem it's more about not introducing regrettable ambiguities into string literals.