ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.65k stars 2.53k forks source link

Make trailing whitespace at the end of multiline strings an error #19299

Open wooster0 opened 7 months ago

wooster0 commented 7 months ago

Status quo

Status quo of trailing whitespace:

test {}  
x.zig:1:9: error: expected type expression, found 'invalid bytes'
test {}  
        ^
x.zig:1:9: note: invalid byte: '\xe3'

There is a normal space followed by a full-width space.

Technically it'd be useful if zig fmt formatted away trailing Unicode whitespace as well but the Zig compiler currently doesn't really have much Unicode awareness. With that in mind, this is what happens for multiline strings:

test {
    const ascii_spaces = \\               
    ;
    _ = ascii_spaces;
    const unicode_full_width_spaces = \\              
    ;
    _ = unicode_full_width_spaces;
}

zig fmt does not format anything away and the compiler doesn't error.

Trailing whitespace as it is handled right now is in my opinion perfectly fine except for multiline strings.

Proposal

I propose zig fmt to not change and the compiler to complain, for a start, when there is ASCII whitespace at the end of a multiline string.

The reason is that trailing whitespace at the end of a multiline string is almost always a mistake and the intent is rather unclear (Communicate intent precisely). Was the intent to have this trailing whitespace or not?

Example: https://github.com/ziglang/zig/blob/778ab767b1e367233a1a3284e2c24d2c27b44602/lib/compiler/aro/aro/Driver.zig#L111-L116

test {
    const help_menu =
        \\  -fno-ms-extensions      Disable support for Microsoft extensions
        \\  -fdollars-in-identifiers        
        \\                          Allow '$' in identifiers
        \\  -fno-dollars-in-identifiers     
        \\                          Disallow '$' in identifiers
        \\  -fmacro-backtrace-limit=<limit>
    ;
    _ = help_menu;
}

The -fdollars-in-identifiers and -fno-dollars-in-identifiers lines have trailing ASCII spaces. Select the text to see them. Mistakes like this are common in multiline strings and even your editor won't format them away for you if it knows about Zig multiline strings (and so knows that the spaces are part of a string, which are part of the program and have semantic implications).

Was this really a mistake or did iddev5 (https://github.com/Vexu/arocc/pull/152/files) mean to put these ASCII spaces there?

The intent is more or less unclear and so I propose the compiler to behave like this:

test {
    const help_menu =
        \\  -fno-ms-extensions      Disable support for Microsoft extensions
        \\  -fdollars-in-identifiers        
        \\                          Allow '$' in identifiers
        \\  -fno-dollars-in-identifiers     
        \\                          Disallow '$' in identifiers
        \\  -fmacro-backtrace-limit=<limit>
    ;
    _ = help_menu;
}
$ zig test x.zig
x.zig:4:44: error: trailing whitespace in multiline string
        \\  -fdollars-in-identifiers        
                                           ^
x.zig:4:10: note: use a singleline string if you meant to put this trailing whitespace

Meaning, the fix would be:

test {
    const help_menu =
        \\  -fno-ms-extensions      Disable support for Microsoft extensions
        \\  -fdollars-in-identifiers
    ++ "        " ++
        \\                          Allow '$' in identifiers
        \\  -fno-dollars-in-identifiers
    ++ "     " ++
        \\                          Disallow '$' in identifiers
        \\  -fmacro-backtrace-limit=<limit>
    ;
    _ = help_menu;
}

Now intent is very clear. You specifically concatenated spaces in between (for whatever reason) so clearly you intended to do this. Because "" strings are singleline, you can inspect whatever is in between those two double quotes and you don't have to worry about trailing whitespace in multiline strings that you didn't see.

Here, again, as above it would technically be useful if it errors for Unicode whitespace as well. But because (as above) zig fmt doesn't format away trailing Unicode whitespace, I think it's fine for the compiler to only complain for ASCII spaces for now, that is until zig fmt starts formatting away trailing Unicode whitespace as well.

Reasoning

The rules "Communicate intent precisely" and "Favor reading code over writing code".

When I say non-ASCII Unicode trailing whitespace, I am also talking about Related Unicode characters with property White_Space=no (second table), such as zero-width spaces. They fit this category as well.

nektro commented 7 months ago

as long as there can be a way such that zig build-* rejects it but zig fmt can still accept and fix it, then I support this.

matklad commented 3 months ago

+1, we regularly step into this in TigerBeetle, where there's an un-noticed trailing whitespace in \\hello world (which is annoying in and of itself), which then subsequently leaks leaks into whatever we are using this string literal for.

I wanted to quickly fix this today, but, after about an hour hacking, I think I'll give up for now :-)

I did come up with some tests here:

https://github.com/matklad/zig/compare/854e86c5676de82bc46b5c13a0c9c807596e438d...6d2506bd1f5f1edfcb18bb9f0024c3b5034d0349

Note for whoever wants to fix this next (perhaps future me?):

Sadly, I was smart enough to come up with all those ideas, but not smart enough to realize that they don't work before implementing them, so I'll leave it at that for now :rofl:

Some further ideas:

mlugg commented 3 months ago

Yeah, I'd expect this to be done as an AstGen error.

nektro commented 3 months ago

another benefit of making it an AstGen error is that zig fmt could autofix it

mlugg commented 3 months ago

@nektro per the literal first sentence of the proposal, we will not have zig fmt modify such code (emphasis my own):

I propose zig fmt to not change and the compiler to complain, for a start, when there is ASCII whitespace at the end of a multiline string.

Having zig fmt silently perform a hard-to-spot, potentially functional, change to code is not a good idea. The whole point of this error is that the intent is unclear. Thus, it would be completely irrational for zig fmt to assume that intent.

nektro commented 3 months ago

if the user wants trailing space they should not be using a multiline string. the whole point of this proposal is that it'd be completely invisible to both the writer and reader of the code. barely any software warns about it and zig fmt asserting/protecting you from it would not be a bad thing

edit: oh i see i already left this opinion when this was first opened so ill stop my comments here

thomab commented 3 months ago

For what it's worth, I don't like it. If I declare a string literal, then most likely that is literally the string that I want.

An example where this would be unfortunate is if you're generating markdown (or similar) files from templates, and you want to include explicit line-breaks (two trailing spaces) in the template.

Couldn't you lint this on your own machines or ci?

bash-5.2$ cat temp.zig
const _ =
    \\hello
    \\world
    \\oops
    \\foo
    \\bar
;
bash-5.2$ grep -EHn '^\s*\\\\.*\s+$' temp.zig
temp.zig:4:    \\oops
matklad commented 3 months ago

An example where this would be unfortunate is if you're generating markdown (or similar) files from templates, and you want to include explicit line-breaks (two trailing spaces) in the template.

This is precisely the case where it is beneficial to see trailing white-space marked explicitly in the source code. It's hard to spot a bug in

    writeMarkdown(
        \\Roses are red  
        \\  Violets are blue,  
        \\Sugar is sweet
        \\  And so are you.
    );
thomab commented 3 months ago

see trailing white-space marked explicitly

But the proposal is to forbid it, not to mark it.

matklad commented 3 months ago

We don't need a new feature to mark trailing whitespace because there is a number of existing language features you can use for this niche purpose sufficiently effectively:

// Use normal string, the perfect solution for this specific example
    writeMarkdown("" ++
        "Roses are red  \n" ++
        "  Violets are blue,  \n" ++
        "Sugar is sweet  \n" ++
        "  And so are you.  \n");

// For one off line break, the following works
    writeMarkdown(
        \\Roses are red
        \\  Violets are blue,
    ++ "  \n" ++
        \\Sugar is sweet
        \\  And so are you.
    );

// If you need this a lot for a particular use-case, you can make domain-specific helper:
    writeMarkdown(pre(
        \\Roses are red
        \\  Violets are blue,
        \\Sugar is sweet
        \\  And so are you.
    ));
    writeMarkdown(
        \\Roses are red {br}
        \\  Violets are blue,  {br}
        \\Sugar is sweet  {br}
        \\  And so are you.  {br}
    );

// Finally, it might be worth to stop and think whether you actually _must_ 
// use a trailing whitespace, or whether there's a better solution:
    writeMarkdown(
        \\Roses are red\
        \\  Violets are blue,\
        \\Sugar is sweet\
        \\  And so are you.\
    );
thomab commented 3 months ago

niche purpose

That is probably true, but there aren't very many people arguing either way here.

\\Roses are red\

I suppose could live with this.

Edited: Grepping a large code base such as zig or tigerbeetle for this takes milliseconds on my machine, though it's only a few years old, I understand it can take some amount longer on older hardware or laptops. Probably not that much longer, though.

thomab commented 3 months ago

I wanted to see if looking for this at build time is feasible and came up with this proof of concept. It's rough and not portable but, it seems like it could be done.

diff --git a/build.zig b/build.zig
index 054b67b..b1ea75c 100644
--- a/build.zig
+++ b/build.zig
@@ -46,6 +46,7 @@ pub fn build(b: *std.Build) !void {

     // Top-level steps you can invoke on the command line.
     const build_steps = .{
+        .trail = b.step("trail", "Look for trailing whitespace"),
         .aof = b.step("aof", "Run TigerBeetle AOF Utility"),
         .check = b.step("check", "Check if TigerBeetle compiles"),
         .clients_c = b.step("clients:c", "Build C client library"),
@@ -257,6 +258,28 @@ pub fn build(b: *std.Build) !void {
         break :blk install.getDest();
     };

+    { // zig build trail
+        var code: u8 = undefined;
+        const grep = b.runAllowFail(&.{
+            "grep",
+            "-EHnr",
+            \\^\s.*\s+$
+            ,
+            "--include=*.zig",
+        }, &code, .Inherit) catch |err| switch (err) {
+            error.ExitCodeFailure => if (code == 1) "ok" else blk: {
+                std.debug.print("{}\n", .{code});
+                break :blk "what";
+            },
+            else => return err,
+        };
+        std.debug.print("{}\n", .{code});
+        if (code > 2) {
+            std.debug.print("{s}\n", .{grep});
+            return error.LintError;
+        }
+    }
+
     { // zig build aof
         const aof = b.addExecutable(.{
             .name = "aof",
diff --git a/src/vopr.zig b/src/vopr.zig
index da11e4c..414b56f 100644
--- a/src/vopr.zig
+++ b/src/vopr.zig
@@ -234,7 +234,7 @@ pub fn main() !void {
         \\          SEED={}
         \\
         \\          replicas={}
-        \\          standbys={}
+        \\          standbys={} 
         \\          clients={}
         \\          request_probability={}%
         \\          idle_on_probability={}%

And the output:

$ zig build trail
170
src/vopr.zig:237:        \\          standbys={} 

error: LintError
/home/thomas/Tools/tigerbeetle/build.zig:279:13: 0x11212c9 in build (build)
            return error.LintError;
            ^
/home/thomas/zig/0.13.0/files/lib/std/Build.zig:2117:24: 0x1103d37 in runBuild__anon_8835 (build)
        .ErrorUnion => try build_zig.build(b),
                       ^
/home/thomas/zig/0.13.0/files/lib/compiler/build_runner.zig:301:9: 0x10ff060 in main (build)
        try builder.runBuild(root);
        ^
error: the following build command failed with exit code 1:
/home/thomas/Tools/tigerbeetle/.zig-cache/o/5ba6ffbcb74aeb9315c6bd2f8977a108/build /home/thomas/zig/0.13.0/files/zig /home/thomas/Tools/tigerbeetle /home/thomas/Tools/tigerbeetle/.zig-cache /home/thomas/.cache/zig --seed 0x9d32edc7 -Zbd117d5a66a8de31 trail
thomab commented 3 months ago

For what it's worth, I know I don't have much of a leg to stand on but I didn't want this to go through without mentioning it.