ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.79k stars 2.47k forks source link

Restore block expressions #9758

Closed acarrico closed 2 years ago

acarrico commented 3 years ago

Learning Zig, I expected blocks to be expressions, which is how they are documented in the manual: "Blocks are used to limit the scope of variable declarations. Blocks are expressions."

But unlabeled blocks are not currently treated as expressions.

Digging in, I discovered that Zig apparently did have an expression oriented block structure back in 2017, but this was removed in favor of statement oriented blocks in #629. I propose that (unlabeled) block expressions should be returned to the language:

Zig is a block structured language. Expressions are easier to reason about than statements. Is there any reason not to do this? My impression is that the change which removed block expressions was mostly about cleaning up exceptional control flow. Was removing block expressions a necessary part of this cleanup, or more of an unintended consequence?

Just to illustrate syntax:

Currently:

test "block expression label" {
    const three: u8 = blk: { break :blk 3; };
    try expect(three == 3);
}

Proposal: one or both of these should succeed:

test "block expression" {
    const three : u8 = { 3 };
    try expect(three == 3);
}

test "block expression or block statement" {
    const three : u8 = { 3; };
    try expect(three == 3);
}

Whichever option best fits the grammar and semantics, it seems like the first option with no semicolon was the old syntax. (as amended below: semicolons should be accepted as both a separator and a terminator for lists of expressions/statements)

acarrico commented 3 years ago

More points:

InKryption commented 3 years ago

In reference to your third point, where you bring up #8019, it's important to note that the proposal exists not because blocks do not currently suffice, but because using blocks to avoid scope pollution is ugly and tedious to write, meaning a majority if people will omit adding the scope, whereas a "with(...)" syntax or equivalent would incentivize less scope pollution, so I don't think this proposal really approaches that issue in the same way.

As it pertains to your proposed reinstitution of _ = { expr } or its variation _ = { expr; }, iirc that syntax was inspired by Rust's block return syntax, and was removed because it is ambiguous, where things like forgetting to omit the semicolon to return from the block would lead to unintuitive compiler errors about assigning void and the like. That's off the top of my head, most probably forgetting some other very important detail(s).

To amend that, why not propose that simply allowing break expr; without a label be a legal way to return from a block, as a middle ground between ambiguity and over-verbosity? Which, I'll admit, I've been thinking of, but hadn't really thought to put it into a proposal.

acarrico commented 3 years ago

@InKryption thanks for the input. Answering your three comments/questions:

test "trailing comma" {
    const Point = struct {x: i32, y: i32};
    const terminating  = Point{.x = 0, .y = 1, };
    const separating = Point{.x = 0, .y=1 };
    try expect(terminating.y == separating.y);
}
ghost commented 3 years ago

I've never understood the decision to go ahead with #629 either. The concerns voiced there all seem rather theoretical. Sure, you can forget the semicolon at the end. Or you can accidentally write {3} instead of .{3}. But so what? Zig is statically typed and will catch such errors easily. If there were some actual footguns involved here, that would be different. But I don't actually see any.

I think Rust's approach is quite reasonable and doesn't take much time to get used to. On the other hand, Zig's appoach with breaks and gratuitous block labels is both clunky and unpopular. There was, for example, a proposal to introduce a result keyword #732 to improve matters, or to allow unlabeled break on all blocks #2990, or at least in situations where there is no conflict with loops #5382, or to avoid labels by using break => val #5083. Maybe it's time to stop looking for workarounds and go back to the most simple, consistent and obvious solution?

ikskuh commented 3 years ago

Maybe it's time to stop looking for workarounds and go back to the most simple, consistent and obvious solution?

I found the Rust code always very hard to read as i have to actually look at line endings instead of just not reading them. The blk: { ... break :blk foo; } thing is orthogonal and way more flexible than the "last statement" thingy:

const foo = blk: {
    if(use_default)
       break :blk default_value;
    var sum: usize = 0;
    for(some_magic_list) |item| {
        if(condition(item))
            break :blk item.special;
        sum += item.value;
    }
    break :blk sum;
};

Can you make a simple example with the proposed syntax how this would turn out?

ghost commented 3 years ago

Can you make a simple example with the proposed syntax how this would turn out?

Not significantly simpler. But the control flow here is actually complex enough to need the labeled break, whereas the most common use of this feature follows the pattern

const foo = blk: {
    // compute result
   break :blk result;
};

where the break and the label only contribute noise.

InKryption commented 3 years ago

Certainly using a block expression to avoid scope pollution fits squarely within style of the Zig language, regardless of the merits of ML style with or Lisp style let, so it is strange that it is currently forbidden.

I'm not 100% if we're thinking on the same lane; to clarify, I'm referring to the fact that the correct code to limit the scope of variables, like this:

{
    var i: usize = 0;
    while (i < N) : (i += 1) {...}
}

is somewhat unpopular, due to the fact that it is more than a few keystrokes (depending on your IDE/text editor), so people will fairly often omit the brackets, making the i variable leak into the rest of the scope. Given that, I'm saying that this proposal would not change that reality, since making all blocks expressions doesn't actually affect purely imperative sections of code (like the one above). So whether or not it's in zig style, the proponents of that use-case you're trying to cater to would likely disagree.

Other than that, this is not so much a critique of the proposal as it is me just pointing out that one of its claims is a bit misplaced.

The problem with adding a break is that control flow keywords (break, continue, return) are intended to noisily flag exceptions to normal block structured control flow which does not exist in this case.

Is that so? My interpretation of the control flow keywords were that they are essentially just sane/controlled gotos, which is what the compiler turns them into anyway. And from that perspective, I suppose your analogy where labelling blocks is akin to labelling assembly, would be quite apt.

But from here it's just a matter of opinion I suppose. I would be in favor of using a bear break (or some other keyword) to return from blocks, but also do not feel too strongly about whether this is accepted as is.

acarrico commented 3 years ago

@MasonRemaley, thank you for putting your finger on the exact issue. I agree that semicolons and commas should be invisible; they are artifacts of the grammar. You should not have to read the line endings to grok the semantics. These comments have really helped me fully understand the issue which doomed Zig's original block expression syntax.

I amend the proposal to adopt the solution that semicolon be accepted as both a separator and a terminator for lists of expressions/statements, just like is now done for commas. This resolves the ambiguity.

NOTE: your example is somewhat distracting because it is a labeled block which necessarily reflects exceptional control flow, whereas the proposal is about normal control flow, so I will not repeat the whole example here, but to answer briefly, under this amendment, your example could end in break :blk sum;, sum; or sum, all with the same effect.

acarrico commented 3 years ago

@InKryption: understood, #8019 stands on its own merits (edit: I have updated the proposal to reflect your point). And I could have added `goto to that list.

ghost commented 3 years ago

Some usage statistics that may be of interest

From the Zig code base.

Returning values from blocks is now the primary use of break. Loop control trails behind by a fair margin.

It is difficult to directly search for uses of labeled breaks that really need them (due to multi-level short-circuiting) vs simple block value returns. So I just selected 25 occurrences at random and examined them manually. The results were:

(Correction: needed / not needed statistics were swapped)

To summarize, about 85-90% of labeled breaks are used for returning block values, and of those, a similar proportion is used in cases where the last expression in the block could be returned implicitly.

Reproducibility

The statistics were obtained by grepping the Zig code base (4 day old master, 0c091feb5ae52caf1ebf885c0de55b3159207001), combined with manual filtering of occurrences in strings and comments.

Occurrences examined in my random sample, with my estimate of whether the break could be replaced with an implicit return:

  1. "/src/Compilation.zig:504" (yes)
  2. "/src/Zir.zig:3823" (yes)
  3. "/src/link/MachO.zig:1425" (yes)
  4. "/lib/std/fs.zig:1307" (no)
  5. "/lib/std/dwarf.zig:696" (yes)
  6. "/src/codegen/wasm.zig:576" (maybe)
  7. "/src/AstGen.zig:1187" (no)
  8. "/src/Compilation.zig:2712" (yes)
  9. "/src/Sema.zig:9372" (yes)
  10. "/src/AstGen.zig:5012" (yes)
  11. "/src/Compilation.zig:972" (yes)
  12. "/src/codegen/wasm.zig:575" (maybe)
  13. "/src/Compilation.zig:965" (yes)
  14. "/src/link/tapi/yaml.zig:124" (yes)
  15. "/src/translate_c/ast.zig:2671" (yes)
  16. "/lib/std/crypto/25519/edwards25519.zig:266" (yes)
  17. "/lib/std/Thread/Futex.zig:207" (yes)
  18. "/src/link/MachO/TextBlock.zig:1114" (yes)
  19. "/src/codegen.zig:4352" (yes)
  20. "/src/Compilation.zig:1766" (yes)
  21. "/src/Zir.zig:4777" (yes)
  22. "/src/Compilation.zig:1088" (yes)
  23. "/lib/std/dwarf.zig:705" (yes)
  24. "./src/AstGen.zig:2540" (yes)
  25. "/src/Sema.zig:2769" (maybe)
acarrico commented 3 years ago

@zzyxyzz thank you for the data (and the correction--I was confused before that arrived). It does seem to confirm my notion that unlabeled block expressions would find quite a bit of use.

It would be very difficult to gather data, but I would imagine there are also many cases where blocks terminate with an unnecessary assignment which could be avoided with a block expression.

yohannd1 commented 3 years ago

I think I've mentioned this before in some other issue, but I don't remember which one it was:

I'm also in favor of using break without a label to return a value off unlabelled blocks.

I found the Rust code always very hard to read as i have to actually look at line endings instead of just not reading them. Maybe if we prepended some symbol to the block from which we want to return the value? Or just ommited the identifier name?

const currently = blk: { break :blk 10; };
const suggestion1 = : { break 10; }
const suggestion2 = : { break : 10; }

By the way, potential syntax ambiguity for suggestion2: does break : x mean "break block with label x" or "break current block and return the value of x" I think it's still almost the same amount of characters to type, but at least there's no need to come up with a new identifier for the blocks in simple situations. This gets specially useful for nested blocks that don't need to break to the outer scopes:


const x = : {
var some_val = : {
var x = 20;
x += 5;
break x;
};
break some_val + 20;

};

acarrico commented 2 years ago

@YohananDiamond, thanks for the input. This issue is focused on the normal, documented, expression-oriented block structure of Zig. It seeks to pinpoint and fix the concerns which caused this behavior to be abandoned in #629. I think any tinkering with the break syntax introduced in #629 should be a separate issue.

The idea is that the reader of a program should identify normal flow by keyword-free expressions, so that labels and keywords can be reserved to clearly flag exceptional control flow. I believe the conversation so far did highlight the primary concern people had with the old Zig syntax: a block's semantics would change based on the presence or absence of a trailing semicolon.

The proposed solution is to restore block expressions without that confusing quirk by always returning the value of the final statement/expression of the block. This would allow programmers to treat semicolon as a separator or a terminator just like comma is currently treated in Zig.

ghost commented 2 years ago

I found the Rust code always very hard to read as i have to actually look at line endings instead of just not reading them

I agree that semicolons and commas should be invisible; they are artifacts of the grammar. You should not have to read the line endings to grok the semantics.

In defense of Rust's no-semicolon rule, I'd like to point out that it is not really necessary to read line endings to identify implicit returns. There are other clues as well, e.g., that a value is produced but not discarded with _ = val;, or the fact that the code is found in a value-returning context such as const foo = { ... bar };. Seen like that, the missing semicolon is merely an additional clue, and I don't see how not having it would improve readability.

The other advantage of the rule is that it is consistent with if (foo) bar else baz and switch syntax, where returned values are not ;-terminated either.

acarrico commented 2 years ago

@zzyxyzz, I'm not parsing part of your comment: "Seen like that, the missing semicolon is merely an additional clue, and I don't see how not having it would improve readability."

I agree that _ = val; is a clear way to discard the final value of a block expression. These should pass:

test "block expression (no semicolon)" {
    const three : u8 = { 3 };
    try expect(three == 3);
}

test "block expression (with semicolon)" {
    const three : u8 = { 3; };
    try expect(three == 3);
}

test "block statement" {
    const v0 : void = { _ = 3; };
    const v1 : void = undefined;
    try expect (v0 == v1);
}
ghost commented 2 years ago

@acarrico Oh, so you meant the semicolon on the last statement/expression in a block to be optional? But why? Terminating every statement with a semicolon is firmly established style in C-like languages, and allowing the last statement to be either terminated or unterminated would just be a second way to do the same exact thing, and would convey zero information to the reader.

If, on the other hand, the semicolon is given the precise semantics of terminating a void-valued statement, then the lack of it at the end of the last line of the block becomes a reliable visual indicator of a value being returned. Although, as I pointed out in my previous comment, even that is not strictly necessary either for the reader or the compiler.

acarrico commented 2 years ago

"Terminating every statement with a semicolon is firmly established style in C-like languages, and allowing the last statement to be either terminated or unterminated would just be a second way to do the same exact thing"

@zzyxyzz, in C, blocks are statements, but in Zig blocks are expressions. Personally, I don't care if a semicolon is required or optional after the last expression, but that semicolon is just syntax terminating or separating the elements of the block: statements are void valued expressions. There is no further distinction, so the semicolon's presence or absence shouldn't change the meaning of the block expression.

acarrico commented 2 years ago

Sorry! Accidentally "closed-with-comment" the issue.

acarrico commented 2 years ago

@zzyxyzz, "But why?" Only because there is no agreement on if the semicolon should be present or absent, and the answer doesn't matter. Allowing it to be optional in {x; y; z;} is similar to the comma in (x, y, z,) which some people hate and some people like. Personally I think of the semicolon as a terminator, and the comma as a separator, but it is a matter of syntax, not semantics. I would have no problem requiring the semicolon to terminate expressions if people agree (to be honest it is actually my preference since I am used to C).

I would accept either syntax as long as we achieve the documented behavior: "Blocks are expressions." That trailing semicolon shouldn't turn them into statements.

ghost commented 2 years ago

@acarrico I see your point, and it's certainly true that making the trailing semicolon optional does not change semantics in any way. I was only expressing my syntactic preferences.

Re lists: Making the last comma optional in lists makes sense, IMHO, because lists (and field declarations) are commonly written in two different layouts: One style is to put the list on one line, as in { a, b, c }. Here the comma has the look and feel of a separator, and it's a bit weird to add a dangling comma at the end. The second style is each item on its own line:

{
   a,
   b,
   c,
}

In this case the trailing comma is preferable, because it feels a bit more like a terminator, and also reduces noise in diffs when you add or remove fields.

Statements, on the other hand, overwhelmingly use the one-per-line style (at least in C-like languages), where the terminator view is more appropriate. So I don't see much practical need for treating the semicolon as a separator.

But this is just my opinion, and we are heading into bikeshedding territory here :smile:.

Mouvedia commented 2 years ago

@andrewrk Is there a reason for the closure? This proposal has 10+ upvotes.

andrewrk commented 2 years ago

It's closed because the proposal has been rejected. The number of upvotes a proposal has is meaningless.

I've already considered (and implemented) block expressions, and then removed them from the language, and I also reconsidered adding them back when reading the comments in this issue. Ultimately a decision needs to be made. And here it is.

acarrico commented 2 years ago

@andrewrk--thanks for considering the proposal!

efjimm commented 2 weeks ago

I understand this issue is closed, but as of 37df6ba86e3f4e0f5d6a20ea8dad8f661fe0849e the Zig repository contains 1087 instances of labelled blocks whose label is blk. Tigerbeetle contains 84 instances, and Bun contains 920 instances. You can verify this by running grep -RE '\bblk\s*:\s*\{' | wc -l in the afformentioned directories.

I argue that in almost all of these cases the label is redundant, serving only as noise, and it's requirement increases friction unnecessarily.

The implementation proposed by yohannd1, where block expressions start with :{ and can broken from with break x;, would bring blocks on par with loops, having labelled and unlabelled variants with the same break semantics. It also would not introduce any ambiguity with normal blocks.