Closed vrischmann closed 3 years ago
I can reproduce the issue on a Linux system, provided that I save the file with DOS newlines (CR+LF). Likely the lexer is choking on the unexpected CR.
Easy steps to reproduce:
vi dosnl.zig
set fileformat=dos
:x
to save + quitzig build-obj dosnl.zig
A patch as simple as this seems to make the tokenizer happy:
diff --git a/lib/std/zig/tokenizer.zig b/lib/std/zig/tokenizer.zig
index 94a20d958..edf690ebe 100644
--- a/lib/std/zig/tokenizer.zig
+++ b/lib/std/zig/tokenizer.zig
@@ -834,7 +834,7 @@ pub const Tokenizer = struct {
},
.multiline_string_literal_line => switch (c) {
- '\n' => {
+ '\r', '\n' => {
self.index += 1;
break;
},
I'll see if there's any tests where I can record the regression and send a PR out for review.
I have the same issue on zig-windows-x86_64-0.9.0-dev.397+b7da1b2d4
Thanks for the patch @jmc-88, however I reverted it because this was working as designed. Carriage returns are discouraged in general, removed by zig fmt
, and in the case of multi-line string literals, not allowed at all. If you need carriage returns in a string literal, you have to use a clunkier syntax.
See https://github.com/ziglang/zig-spec/issues/38 for clarification. I opened an issue because the draft spec and the current compiler implementation do not match up exactly; this issue has the canonical specification.
If no CRs are allowed in multi-line string literals, does this mean it's impossible to include multi-line string literals in source files with DOS newlines without an editor plugin?
I'm not sure what you mean about an editor plugin, but yes, if you need a string with DOS newlines, it would require a strategy such as:
\r
and the ++
operator (this would look similar to C code accomplishing the same task)@embedFile
I mean, if my source file was saved with DOS newlines and it includes multiline strings, the newlines in them would become CRLF and thus rejected by the compiler, as in this issue. But I don't mean to verbatim include those CRLFs in my string, I just want them to be standard '\n'
s. To solve this I'll need to either save the file with UNIX newlines, or use some editor plugin that converts CRs in multiline strings to NL.
I know language such as Javascript accepts all styles of newlines in multiline string literals, but they will get converted to '\n'
, making the newlines unambiguous. Maybe this should also be zig's case?
Thanks for the patch @jmc-88, however I reverted it because this was working as designed. Carriage returns are discouraged in general, removed by
zig fmt
, and in the case of multi-line string literals, not allowed at all. If you need carriage returns in a string literal, you have to use a clunkier syntax.
OK, but then looking at https://github.com/ziglang/zig-spec/issues/38, '\t'
characters should be similarly disallowed, while currently the tokenizer ignores them? From the above it sounds this line should also disappear.
In any case, if certain bytes are disallowed it may help to improve the error message to point out which invalid bytes were found and rejected.
I mean, if my source file was saved with DOS newlines and it includes multiline strings, the newlines in them would become CRLF and thus rejected by the compiler, as in this issue. But I don't mean to verbatim include those CRLFs in my string, I just want them to be standard '\n's. To solve this I'll need to either save the file with UNIX newlines, or use some editor plugin that converts CRs in multiline strings to NL.
I think what @andrewrk suggests is that you need to apply zig fmt
as part of your development workflow, to automatically convert these bytes sequences for you.
I just got back to this. My issue is that git on windows has core.autocrlf=true
by default (at least on my vanilla installation) which now will cause code to not compile. Simply cloning like this works: git clone --config core.autocrlf=false
but I don't think this is ideal.
Also, I can't run zig fmt
on my file either:
PS C:\Users\vincent\dev\tmp\invalid-bytes> C:\Users\vincent\dev\zig-windows-x86_64-0.9.0-dev.453+7ef854682\zig.exe fmt .\main.zig
.\main.zig:3:17: error: expected ',', found invalid bytes
\\foobar
^
.\main.zig:3:18: note: invalid byte: ' '
\\foobar
^
.\main.zig:3:17: error: expected expression, found 'invalid'
\\foobar
^
.\main.zig:3:18: note: invalid byte: ' '
\\foobar
^
.\main.zig:4:6: error: expected test, comptime, var decl, or container field, found ';'
};
^
.\main.zig:5:12: error: expected ',', found ';'
_ = bar;
^
.\main.zig:6:1: error: expected 'eof', found '}'
}
^
I think what @andrewrk suggests is that you need to apply
zig fmt
as part of your development workflow, to automatically convert these bytes sequences for you.
zig fmt
rejects them too, as ziglang/zig-spec#38 suggests:
- zig fmt may not mangle multi line string literals, and therefore the control characters TAB and CR are rejected by the grammar inside multi-line string literals.
This has really sent me down a rabbit hole trying to get certain repositories to work on Windows! I think this is one of those times where striving for simplicity is actually making things harder in certain cases.
It's quite a subtle thing to try and debug for someone not already aware of the issue, especially given the non-visible nature of whitespace. Is there a way for this to fail gracefully? Either by a conversion prompt or a tweak to how things are handled?
I must point out that this has confused me, as well. CR
is mostly invisible to the programmer, so it gets confusing when your code suddenly won't compile on Windows and the compiler is complaining about an invalid byte in your string literal that you can't see.
I agree with throwing compile errors when the programmer themself makes a mistake. However, I do think that this particular issue is less about the programmer and more about compatibility with software made by third parties. And even though most proper text editors have settings for choosing between CRLF
or just LF
, I still think that it's important to support any could-be outliers. In other words: The language belongs to Zig, but the format for encoding text files doesn't. Zig merely supports it, and holds no more power over the encoding than the average text editor does.
@Hadron67 mentioned the idea of simply converting the line endings of string literals to LF
. I agree with that idea. While it technically is a "hidden element", it does simplify things overall and shouldn't be hard to document. It means that you can write the exact same code in any text editor and get the exact same result, regardless of whether the editor uses CRLF
or LF
. I'm fine with using \r
when I really have to, as long as everything else is consistent.
However, if the compiler really must reject CR
like this, I'd appreciate it if it were at least being less confusing about it. Rejecting CR
in multiline string literals specifically seems arbitrary and confusing, so I honestly think that it'd be better to just outright reject CR
everywhere, period. Other than that, simply making the compile error a bit more specific (such as error: expected ',', found invalid byte 'CR'
) seems like a safe bet.
I suggest we re-open this issue.
One more data point: https://lists.sr.ht/~motiejus/bazel-zig-cc/%3C167179347159.25324.12223141818448163553-0%40git.sr.ht%3E#%3Cee9821ec-6b7c-4d62-a1f2-0b54cc2ef728@hahn.graphics%3E
Extract from @FabianHahn email:
Zig further fails to compile launcher.zig with the following error:
launcher.zig:70:73: error: expected ';' after declaration
\\Usage: <...>/tools/<target-triple>/{[zig_tool]s}{[exe]s} <args>...
^~~~
launcher.zig:71:3: note: invalid byte: ' '
\\
^
The reason for this is that Git on Windows inserts carriage return characters before newlines when checking out local files so they show up fine in Windows text editors where \r\n is the default for line breaks. This then breaks the zig parser's support for multiline constants as detailed in https://github.com/ziglang/zig/issues/9257 which is marked as closed even though it doesn't look resolved to me at all. Converting newlines is the default behavior for Git unless you explicitly disable it by setting core.autocrlf in your Git config. While that would be a workaround, I would find this an uncommon thing to do and an odd thing to require of Windows users, which is something also pointed out in the above issue.
That should be covered by #11414
Hi,
the following code fails to compile only on Windows:
I get this error:
I'm testing with
zig-windows-x86_64-0.9.0-dev.321+15a030ef3.zip
.Weird thing is it works fine on Linux and FreeBSD.