Open yohannd1 opened 3 years ago
Should functions have semi-columns at the end also?
fn main() void {
};
Edit: I've not kept up on #1717 , so I'm not sure if the old way of writing function is going away. I apologize for my ignorance
Yeah, #1717 would change things, but maybe the old way will still be available? I'm not sure. If it is kept, yeah, it would have a semicolon at the end, since it's a statement that defines a function (as far as I know).
semicolons are already required at the end of statements. the situations you described that do not require one are not statements at all, they're expressions/blocks.
example
//expression
// conditional block in this case
if (a) { b(); } else { c(); }
//statement (semicolon required)
// assignment is the statement
// the value is the result of the conditional expression
const x = if (a) b else c;
semicolons are already required at the end of statements. the situations you described that do not require one are not statements at all, they're expressions/blocks.
They are statements according to the grammar, lets all use the terminology from the grammar to avoid confusion.
Statement
<- KEYWORD_comptime? VarDecl
/ KEYWORD_comptime BlockExprStatement
/ KEYWORD_nosuspend BlockExprStatement
/ KEYWORD_suspend (SEMICOLON / BlockExprStatement)
/ KEYWORD_defer BlockExprStatement
/ KEYWORD_errdefer BlockExprStatement
/ IfStatement
/ LabeledStatement
/ SwitchExpr
/ AssignExpr SEMICOLON
After taking a look at the grammar again, I noticed there is not only the IfStatement
rule, but also the IfExpr
rule, and they seem to be used in different contexts (I suppose IfStatement
can't be used as an expression, but IfExpr
can).
If I understood what @nektro meant, I suppose it's technically right, but I think the grammar could be tweaked in that regard.
I think with IfStatement
removed, IfExpr
can be used in its place just like SwitchExpr
, which is one of the sub-rules in Statement
.
To clarify more, there are three major levels which the grammar defines: top/decl level, block/statement level, and expression level. The first deals with unordered declarations inside a container type (note: files are container types as they are implicitly structs). The second deals with ordered statements in blocks/function bodies. This is where it is proposed to always require a semicolon. The final expression level appears in certain places within the other two.
Here is some example code to hopefully illustrate this better than words can manage:
// Top/Decl level. the following line is a declaration
const x = 42;
// This comptime block is also a declaration, we don't require a semicolon here
comptime {
// Block/Statement level. the following line is a statement:
const x = 42;
// this line is also a statement, as is the `foo();` inside the block of the if:
if (true) { foo(); }
// Here we use an if expression inside a function call statement:
bar(if (true) 1 else 2);
}
Another alternative is to do a go-style automatic semicolon rule, which would make more semicolons required but less of them visible. To a certain extent it perhaps doesn't really matter what the final decision is, as long as it's consistent and that the compiler makes a reasonable effort to tell users when they've got it wrong.
const x = 42 // compiler infers semicolon since `42` can end a declaration
comptime {
const x = 42 // compiler infers semicolon since `42` can end a statement
if (true) { foo() } // compiler infers semicolon after `)` and after closing brace, which mismatches today's grammar
bar(if (true) 1 else 2) // compiler infers semicolon since `)` can end a statement
} // compiler infers semicolon after closing brace, which mismatches today's grammar
Well, as for automatic semicolons, I think this has been discussed before (https://github.com/ziglang/zig/issues/7938, https://github.com/ziglang/zig/issues/3188, https://github.com/ziglang/zig/issues/483). It seems most people don't want semicolons (I personally don't either), but I agree with you on prioritizing consistency.
I prefer semicolons rather than newlines when it comes to end statements. (newlines are blank characters, just like spaces)
From language purist perspective this sounds justified. But from practical perspective, this would terribly hurt adoption of zig by people coming from C, C++ and many other languages. Having to write semicolon after each block just feels so obtrusive...
newlines are blank characters, just like spaces
Yes, but these blank chars have a very important meaning. Spaces can separate two code tokens. Thanks to the space const x
has very different meaning than constx
. And like spaces separating tokens, newlines can separate statements. Therefore I do not see any benefit in forcing semicolons after every statement. IMO.
Seems to me that the actual issue is that some things that look like they shouldn't require semicolons actually do, like:
defer if (true) {};
I think it is reasonable to be confused by this, so it seems like this is the actual issue that should be solved. Playing around a bit, we can change the grammar to this to make this code compile:
Statement
<- KEYWORD_comptime? VarDecl
/ KEYWORD_comptime Statement
/ KEYWORD_nosuspend Statement
/ KEYWORD_suspend Statement
/ KEYWORD_defer Statement
/ KEYWORD_errdefer Payload? Statement
/ IfStatement
/ LabeledStatement
/ SwitchExpr
/ AssignExpr SEMICOLON
Ofc, this is not a full solution and this needs to be checked for edge cases.
I'll argue for the opposite:
var
a
:
i32
=
foo(
a
,
b
,
c
) +
8
As someone who got used to Erlang's use of punctuation, I can live with most choices. I think I probably slightly prefer explicit semicolons after everything. Just don't make them "optional". There is no reason to repeat the disaster that optional semicolons wrought in Javascript.
However, requiring the semicolon would seem to make parsing recovery after syntax errors a lot easier for tooling. That's probably more newbie-friendly than the actual fact of having more semicolons than C/C++.
After all, people got used to Rust where the existence or not of a semicolon changes what your return value is. That is far more intrusive than requiring semicolons everywhere.
As already stated, semicolon are not super-easy to type with a non-ANSI layout.
I really like zig, but having used languages without semicolons for the last ~7 years, going back to semicolons gives me PTSD of the C ages.
Honestly with Go, absence of semicolons works pretty well, and I still can't find a real argument against removing them other than "code clarity" but that shouldn't depend on the semicolon TBH.
I have seen a lot of issues regarding this and strong opinions from both sides but not a single official answer from the team on which is the intended direction. I understand that taking a decision is not simple but having a clear direction would be nice for people that want to approach the languge.
I think I probably slightly prefer explicit semicolons after everything. Just don't make them "optional". There is no reason to repeat the disaster that optional semicolons wrought in Javascript.
As with many things Javascript, the implementation of "automatic semicolon insertion" is uniquely bad. Lua and Julia are two languages which do what I would prefer to see Zig do also: a newline terminates an expression, a semicolon is allowed, and to write on one line what would otherwise require a newline, you must use a semicolon. There's no ambiguity, because semicolons aren't "inserted" via a hard-to-understand rule. They're allowed where a newline is required, and are required if the newline isn't present.
This does mean that some expressions have to be written with parentheses which would otherwise not be needed. Another option is backslash-escaping a newline to make a multi-line expression behave like a single line. I've found the tradeoff to be well worth it.
I have been using Scala for ~10 years (And some Kotlin too, that has the same behaviour). It had optional semicolons basically like what @mnemnion described from the time I started to use it. I never had a problem of ambiguity. One thing that can be unituative was
1 + //compiles, since the compiler finds the rhs in the next line
1
1
+ 1 // does not compile
Scala 3 this is gone, since we have meaningful whitespace
1
+ 1 // has a leading space, so it is part of the expression of the line above.
I don't say do meaningful whitespace, but optional semicolons are no practical issue.
This is a bit of meander about syntax, specifically, if expressions. My premise is that if any part of the language would suffer from optional semicolons, it would be if expressions.
Background: when writing C, I have an ironclad rule to always put braces around the prongs of if statements. It's just too easy to add another statement and silently take an unconditional action, there have been some famous bugs from this, such as 'goto fail'.
The problem in a nutshell:
// starts like this:
if (someCondition())
whenTrue();
// becomes this
if (someCondition())
whenTrue();
oops(); // executed unconditionally.
However, this is also valid Zig. The reason I don't follow that rule with Zig code is actually cultural: like ~everyone else, I use zig fmt
, it triggers on save, and the indentation would tell me I did something wrong. It's still a somewhat dangerous construct, my personal style is that an if-only branch with a single expression should go on a single line, so I use the style above only when there's an else:
if (someCondition())
whenTrue()
else
whenFalse();
Here we get a compile error if something is added to the true prong— but this would be true in C also. We've pushed the problem to the false prong, and once again, it's formatting which comes to the rescue and makes this tolerably safe.
So my conclusion is that the absence of semicolons wouldn't harm this construct at all. The example would just be this:
// still caught by formatting, if anything
if (someCondition())
whenTrue()
oops()
Or this:
if (someCondition())
whenTrue()
oops() // still a compile error
else
whenFalse()
So nothing has actually changed.
Other cases where semicolons appear to be semantically meaningful don't, on close examination, appear to matter at all.
Like this one:
// bare switch:
switch (anEnum) {
.fee => sayFee(),
.fie => sayFie(),
.foe => sayFoe(),
} // <- Semicolon not allowed
// Assigning switch
const said = switch(anEnum) {
.fee => "fee",
.fie => "fie",
.foe => "foe",
}; // <- Semicolon is mandatory
If semicolons were optional, the first one would still be an illegal place to put a semicolon, and the second one would allow it but not require it. That would be neither harder to parse, nor more difficult to read and understand.
In other words, semicolons and their lack enforce a distinction between expressions and statements, but I haven't found a place where they uniquely determine that difference. That is, where adding a semicolon changes a statement to an expression without any further changes, and resulting in a valid program with a different meaning.
TL;DR, I've convinced myself that semicolons could be made completely optional in Zig, except when multiple expressions are placed on the same line. It would be a 100% backward compatible change, requiring no alteration to any existing programs, and it wouldn't harm understanding of semicolon-free programs, or result in any new dangerous ambiguities. If there's a counterexample, I'm eager to hear it.
As a note, I've worked professionally with and on Parsing Expression Grammars for years, this is a change I could plausibly make myself. From reading the grammar, it should be literally as cheap as making the semicolon rule accept a newline as well as a semicolon.
Making it all function is likely to be enough work that I would want some reasonable buy-in on the idea before proceeding, but I'd like the core team to consider this as an offer, rather than a request or, heaven forfend, a demand.
I never had a problem of ambiguity. One thing that can be unintuitive was
1 + //compiles, since the compiler finds the rhs in the next line
1
1
+ 1 // does not compile
Well drat, I take it back about optional semicolons not invalidating any currently well-formed programs.
But I think that in practice this is no problem at all, because of, once again, zig fmt
. I just checked, and, no surprise, it puts that sort of spread-out math statement onto one line.
A breaking change is a breaking change, though. I'm fairly sure that Zig doesn't enforce a rule that code must be run through zig fmt
, and without that, there are conceivable programs which would not compile with optional semicolons.
As a practical matter, the number of actual programs to which this would apply could be as few as "none". If there's an example of a possible program which would compile, but would change meaning, that could be a show-stopper.
This would still compile:
const result = 1 + // semicolon illegal here, just whitespace
1;
This would not:
const result = 1 // valid semicolon, inserted
+ 1; // invalid fragment, compiler error
This would call for a careful analysis to see if there are cases which wouldn't fall into one or the other category.
I'll save some time (would have been better to read the linked issues thoroughly before I went off like this):
// when does this invoke some_function()?
if (some_long_condition()) return
some_function();
// same
for (some_slice) |x| {
if (some_long_condition()) break
some_function();
}
These are genuine ambiguities. An optional-semicolon grammar would count the newline after the return
and break
as a terminator, which would be wrong, and the program would compile.
Only mitigation here is good ol' zig fmt
, which turns those statements into this:
if (some_long_condition()) return some_function();
// same
for (some_slice) |x| {
if (some_long_condition()) break some_function();
}
But this weakens the proposal considerably. If you add enough characters to some_long_condition
, it won't reformat to a single line, either.
Detectable? Yes, this is detectable, using a before-and-after parse, and I would venture that all ambiguities would be detectable that way.
Bad style? Absolutely, no one should be stranding a break or return statement that way, it's awful style.
Worth it? Maybe. But breaking change? Well and truly would be.
Note: I'm making some assumptions about the language - specially about the grammar. I've read part of the grammar but I'm not very confident on if I understood it well, so if I make some mistake please correct me.
There's a thing in zig that has been bothering me for a while: depending on the situation, it's either needed or forbidden to put a semicolon. As an example:
(There are other situations where this might show up, but I think this one I showed is one of the most interesting.)
I think this might cause quite a lot of confusion for beginners but, not only that, it might still be a little bit annoying for people that are already used to Zig.
My solution to this is to require semicolons at the end of every statement. With this change, what we currently have as:
Would become:
The grammar definition would also benefit from this, since currently the
SEMICOLON
token is declared in several different ways to accomodate how each kind of statement works.Wrapping up, here's a list of pros and cons I could think of with the implementation of this proposal as-is.
Pros
if
from C/C++'sif
if
is an expression, and only the body of the last expression should have a semicolon, if it's not a block, switch or if statement.if
is a statement, and each branch should end with a semicolon.Cons
Note: I've also thought about removing the requirement of having semicolons at the end of a statement that ends with braces but I think it's harder to understand, the grammar gets more complex and the example below would be ambiguous: