Proposal: Multiple Values In Escape Sequences

ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

https://ziglang.org

MIT License

32.86k stars 2.4k forks source link

Proposal: Multiple Values In Escape Sequences #17385

Open exxjob opened 10 months ago

exxjob commented 10 months ago

Preferring escape sequences to UTF-8 in source is a common coding standard, one reason being security. Directionalities, dingbats, emojis, diacritics, logograms, notations, controls... shouldn't or can't be printed in source files in many contexts. Currently, successive UTF-8 codepoints in escape sequences looks like so:

const a = "\u{a1f3b}\u{a1f3c}\u{a1f3d}\u{a1f3e}\u{a1f3f}";

The proposal is to support multiple values in escape sequences with this syntax:

const b = "\u{a1f3b\ a1f3c\ a1f3d\ a1f3e\ a1f3f}";

This is easier and safer to read and write. Backslash delimits at the end of a codepoint. Also applies for #17376 if accepted. See comment https://github.com/ziglang/zig/issues/17376#issuecomment-1745072369

exxjob commented 7 months ago

To clear up ambiguities:

Escape sequences as described in the proposal cannot be multiline - that would be unnecessarily abstruse.
The backslash delimiter is used in place of a comma to distinguish it as part of an escape sequence within a literal. To avoid further confusions, trailing delimiter (which would cause a \} sequence) may be forbidden.
17584.
(After { or delimiter) in-between spaces should be respected as formatting. Presumably this won't necessitate zig fmt changes. Visual underscore separators would be good for symmetry with number literals and highlighting Unicode planes and ranges. https://github.com/ziglang/zig/issues/17376#issuecomment-1745072369

This proposal would also make it intuitive to handle Unicode grapheme clusters, ZWJ / VS15 / VS16 emojis, and other needs. Minutia can be changed around, but I think this is approximately the right way to go about it.

rohlem commented 3 months ago

Is there precedence for \ meaning multiple elements? I agree that it's an improvement, but just naively looking at this, I think simply delimiting elements via comma , (maybe which optional space after it) would look even more readable to me personally. (I assume a parser already has to enter a special state to implement this, so I don't think giving special meaning to , in this context would affect performance much, if that was the reason.)

exxjob commented 3 months ago

There is a precedence for \\ denoting ~~multiple~~ sequenced elements in the form of multiline strings.
There is a precedence for \ delimiting escape sequences within a string literal.
There is a precedence for , being a character within a string literal.

@rohlem it's more about not introducing regrettable ambiguities into string literals.

ziglang / zig

Proposal: Multiple Values In Escape Sequences #17385

17584.