zig fmt: embrace alignment

After working with Zig for several years, I want to propose to reverse the current auto-format stance on alignment.

I don't think anyone disagrees that readability is improved when certain things are more aligned, especially when it's tables of data. We can find tables of data in plenty of real world projects. Furthermore, it has become common to use zig fmt: off in many switch expressions, a pattern that certainly helps avoid bugs.

My arguments against alignment were primarily that it harms line-based text diffs and conflict resolution since a single-line change may affect more than one line. Secondarily, that the auto-formatter may not have enough information to make a reasonable choice on when or how to align things.

Regarding the first point, I think that Zig should look towards the future rather than status quo. Will we be stuck with contemporary version control software forever? Perhaps not. Modern tools such as git already support ignoring whitespace changes, both in viewing diffs and in doing conflict resolution. Future tools may offer something beyond text diffs, so let's not let that hold us down, and in fact let's provide some incentive to innovate. I think the value of alignment is more than I previously estimated, and the harm of line diff noise less than I previously estimated.

As for when or how to align things, I have a concrete algorithm to propose:

Break the code into a set of line groups. Line groups are delineated by the start of file, end of file, and two consecutive line breaks.
When all lines in a line group have the same token tags in the same order, that line group is aligned such that each token shares the same column for the leftmost character of the token, except for number literals, which share the same column for the rightmost character.
- alternative: choose left-aligned or right-aligned based on which one would provide the greater subset of shared characters across lines

What about unicode? Two options here:

non-ascii codepoints disqualify a line from auto-alignment
non-ascii codepoints disqualify a line from auto-alignment, but only until the first non-ascii codepoint is found. I think this would be fine in practice.

Optional extension to this proposal: // zig fmt: align could be used to additionally opt-in to more aggressive alignment for any particular line group which would do alignment even if the list of token tags were not identical.

I think if we want to proceed with this, we should start by implementing the algorithm in zig fmt, run it on some real world codebases, and then see if the maintainers of those projects find the changes favorable.

2196

To give some visuals to this proposal, here's what would happen to this block of code: https://github.com/ziglang/zig/blob/8fb4a4efbabbe3fe98521201eac41feee5a9a50a/lib/std/multi_array_list.zig#L687-L753

    try list.ensureTotalCapacity(ally, 20);

    try list.append(ally, .{ .tag = .keyword_const , .start =   0 });
    try list.append(ally, .{ .tag = .identifier    , .start =   6 });
    try list.append(ally, .{ .tag = .equal         , .start =  10 });
    try list.append(ally, .{ .tag = .builtin       , .start =  12 });
    try list.append(ally, .{ .tag = .l_paren       , .start =  19 });
    try list.append(ally, .{ .tag = .string_literal, .start =  20 });
    try list.append(ally, .{ .tag = .r_paren       , .start =  25 });
    try list.append(ally, .{ .tag = .semicolon     , .start =  26 });
    try list.append(ally, .{ .tag = .keyword_pub   , .start =  29 });
    try list.append(ally, .{ .tag = .keyword_fn    , .start =  33 });
    try list.append(ally, .{ .tag = .identifier    , .start =  36 });
    try list.append(ally, .{ .tag = .l_paren       , .start =  40 });
    try list.append(ally, .{ .tag = .r_paren       , .start =  41 });
    try list.append(ally, .{ .tag = .identifier    , .start =  43 });
    try list.append(ally, .{ .tag = .bang          , .start =  51 });
    try list.append(ally, .{ .tag = .identifier    , .start =  52 });
    try list.append(ally, .{ .tag = .l_brace       , .start =  57 });
    try list.append(ally, .{ .tag = .identifier    , .start =  63 });
    try list.append(ally, .{ .tag = .period        , .start =  66 });
    try list.append(ally, .{ .tag = .identifier    , .start =  67 });
    try list.append(ally, .{ .tag = .period        , .start =  70 });
    try list.append(ally, .{ .tag = .identifier    , .start =  71 });
    try list.append(ally, .{ .tag = .l_paren       , .start =  75 });
    try list.append(ally, .{ .tag = .string_literal, .start =  76 });
    try list.append(ally, .{ .tag = .comma         , .start = 113 });
    try list.append(ally, .{ .tag = .period        , .start = 115 });
    try list.append(ally, .{ .tag = .l_brace       , .start = 116 });
    try list.append(ally, .{ .tag = .r_brace       , .start = 117 });
    try list.append(ally, .{ .tag = .r_paren       , .start = 118 });
    try list.append(ally, .{ .tag = .semicolon     , .start = 119 });
    try list.append(ally, .{ .tag = .r_brace       , .start = 121 });
    try list.append(ally, .{ .tag = .eof           , .start = 123 });

Note that if there was no blank line after the first line, then the alignment wouldn't kick in. Consequentially, nothing would happen to the block of testing.expect calls, because there is no blank line separating the tags assignment with the rest of the lines.

Another example from the same file: https://github.com/ziglang/zig/blob/8fb4a4efbabbe3fe98521201eac41feee5a9a50a/lib/std/multi_array_list.zig#L870-L877

The proposal here would not solve this alignment use case, because 'b' and 1 have different token tags (i believe i know what a token tag is.). So for this block of code, we'd still need // zig fmt: off.

One more exercise: I was curious what would happen if we removed the first clause of the specification about separating into intentional blocks, and I tried formatting the entire multi_array_list.zig file using only the second rule: Any sequence of consecutive lines that have the same sequence of token tags gets aligned. To be clear, this is not what is being proposed here, but it gives illustrations of how things could be aligned if blank lines were inserted in various places, which I think is interesting.

diff --git a/lib/std/multi_array_list.zig b/lib/std/multi_array_list.zig
index 8b5df4a2e..0cb694240 100644
--- a/lib/std/multi_array_list.zig
+++ b/lib/std/multi_array_list.zig
@@ -1,10 +1,10 @@
-const std = @import("std");
+const std     = @import("std"    );
 const builtin = @import("builtin");
 const assert = std.debug.assert;
-const meta = std.meta;
-const mem = std.mem;
+const meta      = std.meta     ;
+const mem       = std.mem      ;
 const Allocator = mem.Allocator;
-const testing = std.testing;
+const testing   = std.testing  ;

 /// A MultiArrayList stores a list of a struct or tagged union type.
 /// Instead of storing a single list of items, MultiArrayList
@@ -20,7 +20,7 @@ const testing = std.testing;
 pub fn MultiArrayList(comptime T: type) type {
     return struct {
         bytes: [*]align(@alignOf(T)) u8 = undefined,
-        len: usize = 0,
+        len     : usize = 0,
         capacity: usize = 0,

         const Elem = switch (@typeInfo(T)) {
@@ -35,7 +35,7 @@ pub fn MultiArrayList(comptime T: type) type {
                 } });
                 pub const Tag =
                     u.tag_type orelse @compileError("MultiArrayList does not support untagged unions");
-                tags: Tag,
+                tags: Tag ,
                 data: Bare,

                 pub fn fromT(outer: T) @This() {
@@ -66,7 +66,7 @@ pub fn MultiArrayList(comptime T: type) type {
             /// This array is indexed by the field index which can be obtained
             /// by using @intFromEnum() on the Field enum
             ptrs: [fields.len][*]u8,
-            len: usize,
+            len     : usize,
             capacity: usize,

             pub fn items(self: Slice, comptime field: Field) []FieldType(field) {
@@ -113,7 +113,7 @@ pub fn MultiArrayList(comptime T: type) type {
                 const aligned_ptr: [*]align(@alignOf(Elem)) u8 = @alignCast(unaligned_ptr);
                 return .{
                     .bytes = aligned_ptr,
-                    .len = self.len,
+                    .len      = self.len,
                     .capacity = self.capacity,
                 };
             }
@@ -141,9 +141,9 @@ pub fn MultiArrayList(comptime T: type) type {
         /// `sizes.fields` is an array mapping from `sizes.bytes` array index to field index.
         const sizes = blk: {
             const Data = struct {
-                size: usize,
+                size      : usize,
                 size_index: usize,
-                alignment: usize,
+                alignment : usize,
             };
             var data: [fields.len]Data = undefined;
             for (fields, 0..) |field_info, i| {
@@ -160,14 +160,14 @@ pub fn MultiArrayList(comptime T: type) type {
                 }
             };
             mem.sort(Data, &data, {}, Sort.lessThan);
-            var sizes_bytes: [fields.len]usize = undefined;
+            var sizes_bytes  : [fields.len]usize = undefined;
             var field_indexes: [fields.len]usize = undefined;
             for (data, 0..) |elem, i| {
-                sizes_bytes[i] = elem.size;
+                sizes_bytes  [i] = elem.size      ;
                 field_indexes[i] = elem.size_index;
             }
             break :blk .{
-                .bytes = sizes_bytes,
+                .bytes  = sizes_bytes  ,
                 .fields = field_indexes,
             };
         };
@@ -347,7 +347,7 @@ pub fn MultiArrayList(comptime T: type) type {
                 return;
             }
             assert(new_len <= self.capacity);
-            assert(new_len <= self.len);
+            assert(new_len <= self.len     );

             const other_bytes = gpa.alignedAlloc(
                 u8,
@@ -371,10 +371,10 @@ pub fn MultiArrayList(comptime T: type) type {
             var other = Self{
                 .bytes = other_bytes.ptr,
                 .capacity = new_len,
-                .len = new_len,
+                .len      = new_len,
             };
             self.len = new_len;
-            const self_slice = self.slice();
+            const self_slice  = self .slice();
             const other_slice = other.slice();
             inline for (fields, 0..) |field_info, i| {
                 if (@sizeOf(field_info.type) != 0) {
@@ -555,7 +555,7 @@ pub fn MultiArrayList(comptime T: type) type {
                 .type = *fields[i].type,
                 .default_value = null,
                 .is_comptime = fields[i].is_comptime,
-                .alignment = fields[i].alignment,
+                .alignment   = fields[i].alignment  ,
             };
             break :entry @Type(.{ .Struct = .{
                 .layout = .Extern,
@@ -567,7 +567,7 @@ pub fn MultiArrayList(comptime T: type) type {
         /// This function is used in the debugger pretty formatters in tools/ to fetch the
         /// child field order and entry type to facilitate fancy debug printing for this type.
         fn dbHelper(self: *Self, child: *Elem, field: *Field, entry: *Entry) void {
-            _ = self;
+            _ = self ;
             _ = child;
             _ = field;
             _ = entry;
@@ -627,8 +627,8 @@ test "basic usage" {
     try testing.expectEqualSlices(u8, list.items(.c), &[_]u8{ 'a', 'b', 'c' });

     try testing.expectEqual(@as(usize, 3), list.items(.b).len);
-    try testing.expectEqualStrings("foobar", list.items(.b)[0]);
-    try testing.expectEqualStrings("zigzag", list.items(.b)[1]);
+    try testing.expectEqualStrings("foobar",   list.items(.b)[0]);
+    try testing.expectEqualStrings("zigzag",   list.items(.b)[1]);
     try testing.expectEqualStrings("fizzbuzz", list.items(.b)[2]);

     // Add 6 more things to force a capacity increase.
@@ -658,8 +658,8 @@ test "basic usage" {
     try testing.expectEqualSlices(u8, list.items(.c), &[_]u8{ 'a', 'b', 'c' });

     try testing.expectEqual(@as(usize, 3), list.items(.b).len);
-    try testing.expectEqualStrings("foobar", list.items(.b)[0]);
-    try testing.expectEqualStrings("zigzag", list.items(.b)[1]);
+    try testing.expectEqualStrings("foobar",   list.items(.b)[0]);
+    try testing.expectEqualStrings("zigzag",   list.items(.b)[1]);
     try testing.expectEqualStrings("fizzbuzz", list.items(.b)[2]);

     list.set(try list.addOne(ally), .{
@@ -686,71 +686,71 @@ test "regression test for @reduce bug" {

     try list.ensureTotalCapacity(ally, 20);

-    try list.append(ally, .{ .tag = .keyword_const, .start = 0 });
-    try list.append(ally, .{ .tag = .identifier, .start = 6 });
-    try list.append(ally, .{ .tag = .equal, .start = 10 });
-    try list.append(ally, .{ .tag = .builtin, .start = 12 });
-    try list.append(ally, .{ .tag = .l_paren, .start = 19 });
-    try list.append(ally, .{ .tag = .string_literal, .start = 20 });
-    try list.append(ally, .{ .tag = .r_paren, .start = 25 });
-    try list.append(ally, .{ .tag = .semicolon, .start = 26 });
-    try list.append(ally, .{ .tag = .keyword_pub, .start = 29 });
-    try list.append(ally, .{ .tag = .keyword_fn, .start = 33 });
-    try list.append(ally, .{ .tag = .identifier, .start = 36 });
-    try list.append(ally, .{ .tag = .l_paren, .start = 40 });
-    try list.append(ally, .{ .tag = .r_paren, .start = 41 });
-    try list.append(ally, .{ .tag = .identifier, .start = 43 });
-    try list.append(ally, .{ .tag = .bang, .start = 51 });
-    try list.append(ally, .{ .tag = .identifier, .start = 52 });
-    try list.append(ally, .{ .tag = .l_brace, .start = 57 });
-    try list.append(ally, .{ .tag = .identifier, .start = 63 });
-    try list.append(ally, .{ .tag = .period, .start = 66 });
-    try list.append(ally, .{ .tag = .identifier, .start = 67 });
-    try list.append(ally, .{ .tag = .period, .start = 70 });
-    try list.append(ally, .{ .tag = .identifier, .start = 71 });
-    try list.append(ally, .{ .tag = .l_paren, .start = 75 });
-    try list.append(ally, .{ .tag = .string_literal, .start = 76 });
-    try list.append(ally, .{ .tag = .comma, .start = 113 });
-    try list.append(ally, .{ .tag = .period, .start = 115 });
-    try list.append(ally, .{ .tag = .l_brace, .start = 116 });
-    try list.append(ally, .{ .tag = .r_brace, .start = 117 });
-    try list.append(ally, .{ .tag = .r_paren, .start = 118 });
-    try list.append(ally, .{ .tag = .semicolon, .start = 119 });
-    try list.append(ally, .{ .tag = .r_brace, .start = 121 });
-    try list.append(ally, .{ .tag = .eof, .start = 123 });
+    try list.append(ally, .{ .tag = .keyword_const , .start =   0 });
+    try list.append(ally, .{ .tag = .identifier    , .start =   6 });
+    try list.append(ally, .{ .tag = .equal         , .start =  10 });
+    try list.append(ally, .{ .tag = .builtin       , .start =  12 });
+    try list.append(ally, .{ .tag = .l_paren       , .start =  19 });
+    try list.append(ally, .{ .tag = .string_literal, .start =  20 });
+    try list.append(ally, .{ .tag = .r_paren       , .start =  25 });
+    try list.append(ally, .{ .tag = .semicolon     , .start =  26 });
+    try list.append(ally, .{ .tag = .keyword_pub   , .start =  29 });
+    try list.append(ally, .{ .tag = .keyword_fn    , .start =  33 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  36 });
+    try list.append(ally, .{ .tag = .l_paren       , .start =  40 });
+    try list.append(ally, .{ .tag = .r_paren       , .start =  41 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  43 });
+    try list.append(ally, .{ .tag = .bang          , .start =  51 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  52 });
+    try list.append(ally, .{ .tag = .l_brace       , .start =  57 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  63 });
+    try list.append(ally, .{ .tag = .period        , .start =  66 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  67 });
+    try list.append(ally, .{ .tag = .period        , .start =  70 });
+    try list.append(ally, .{ .tag = .identifier    , .start =  71 });
+    try list.append(ally, .{ .tag = .l_paren       , .start =  75 });
+    try list.append(ally, .{ .tag = .string_literal, .start =  76 });
+    try list.append(ally, .{ .tag = .comma         , .start = 113 });
+    try list.append(ally, .{ .tag = .period        , .start = 115 });
+    try list.append(ally, .{ .tag = .l_brace       , .start = 116 });
+    try list.append(ally, .{ .tag = .r_brace       , .start = 117 });
+    try list.append(ally, .{ .tag = .r_paren       , .start = 118 });
+    try list.append(ally, .{ .tag = .semicolon     , .start = 119 });
+    try list.append(ally, .{ .tag = .r_brace       , .start = 121 });
+    try list.append(ally, .{ .tag = .eof           , .start = 123 });

     const tags = list.items(.tag);
-    try testing.expectEqual(tags[1], .identifier);
-    try testing.expectEqual(tags[2], .equal);
-    try testing.expectEqual(tags[3], .builtin);
-    try testing.expectEqual(tags[4], .l_paren);
-    try testing.expectEqual(tags[5], .string_literal);
-    try testing.expectEqual(tags[6], .r_paren);
-    try testing.expectEqual(tags[7], .semicolon);
-    try testing.expectEqual(tags[8], .keyword_pub);
-    try testing.expectEqual(tags[9], .keyword_fn);
-    try testing.expectEqual(tags[10], .identifier);
-    try testing.expectEqual(tags[11], .l_paren);
-    try testing.expectEqual(tags[12], .r_paren);
-    try testing.expectEqual(tags[13], .identifier);
-    try testing.expectEqual(tags[14], .bang);
-    try testing.expectEqual(tags[15], .identifier);
-    try testing.expectEqual(tags[16], .l_brace);
-    try testing.expectEqual(tags[17], .identifier);
-    try testing.expectEqual(tags[18], .period);
-    try testing.expectEqual(tags[19], .identifier);
-    try testing.expectEqual(tags[20], .period);
-    try testing.expectEqual(tags[21], .identifier);
-    try testing.expectEqual(tags[22], .l_paren);
+    try testing.expectEqual(tags[ 1], .identifier    );
+    try testing.expectEqual(tags[ 2], .equal         );
+    try testing.expectEqual(tags[ 3], .builtin       );
+    try testing.expectEqual(tags[ 4], .l_paren       );
+    try testing.expectEqual(tags[ 5], .string_literal);
+    try testing.expectEqual(tags[ 6], .r_paren       );
+    try testing.expectEqual(tags[ 7], .semicolon     );
+    try testing.expectEqual(tags[ 8], .keyword_pub   );
+    try testing.expectEqual(tags[ 9], .keyword_fn    );
+    try testing.expectEqual(tags[10], .identifier    );
+    try testing.expectEqual(tags[11], .l_paren       );
+    try testing.expectEqual(tags[12], .r_paren       );
+    try testing.expectEqual(tags[13], .identifier    );
+    try testing.expectEqual(tags[14], .bang          );
+    try testing.expectEqual(tags[15], .identifier    );
+    try testing.expectEqual(tags[16], .l_brace       );
+    try testing.expectEqual(tags[17], .identifier    );
+    try testing.expectEqual(tags[18], .period        );
+    try testing.expectEqual(tags[19], .identifier    );
+    try testing.expectEqual(tags[20], .period        );
+    try testing.expectEqual(tags[21], .identifier    );
+    try testing.expectEqual(tags[22], .l_paren       );
     try testing.expectEqual(tags[23], .string_literal);
-    try testing.expectEqual(tags[24], .comma);
-    try testing.expectEqual(tags[25], .period);
-    try testing.expectEqual(tags[26], .l_brace);
-    try testing.expectEqual(tags[27], .r_brace);
-    try testing.expectEqual(tags[28], .r_paren);
-    try testing.expectEqual(tags[29], .semicolon);
-    try testing.expectEqual(tags[30], .r_brace);
-    try testing.expectEqual(tags[31], .eof);
+    try testing.expectEqual(tags[24], .comma         );
+    try testing.expectEqual(tags[25], .period        );
+    try testing.expectEqual(tags[26], .l_brace       );
+    try testing.expectEqual(tags[27], .r_brace       );
+    try testing.expectEqual(tags[28], .r_paren       );
+    try testing.expectEqual(tags[29], .semicolon     );
+    try testing.expectEqual(tags[30], .r_brace       );
+    try testing.expectEqual(tags[31], .eof           );
 }

 test "ensure capacity on empty list" {
@@ -769,23 +769,23 @@ test "ensure capacity on empty list" {
     list.appendAssumeCapacity(.{ .a = 3, .b = 4 });

     try testing.expectEqualSlices(u32, &[_]u32{ 1, 3 }, list.items(.a));
-    try testing.expectEqualSlices(u8, &[_]u8{ 2, 4 }, list.items(.b));
+    try testing.expectEqualSlices(u8 , &[_]u8 { 2, 4 }, list.items(.b));

     list.len = 0;
     list.appendAssumeCapacity(.{ .a = 5, .b = 6 });
     list.appendAssumeCapacity(.{ .a = 7, .b = 8 });

     try testing.expectEqualSlices(u32, &[_]u32{ 5, 7 }, list.items(.a));
-    try testing.expectEqualSlices(u8, &[_]u8{ 6, 8 }, list.items(.b));
+    try testing.expectEqualSlices(u8 , &[_]u8 { 6, 8 }, list.items(.b));

     list.len = 0;
     try list.ensureTotalCapacity(ally, 16);

-    list.appendAssumeCapacity(.{ .a = 9, .b = 10 });
+    list.appendAssumeCapacity(.{ .a =  9, .b = 10 });
     list.appendAssumeCapacity(.{ .a = 11, .b = 12 });

-    try testing.expectEqualSlices(u32, &[_]u32{ 9, 11 }, list.items(.a));
-    try testing.expectEqualSlices(u8, &[_]u8{ 10, 12 }, list.items(.b));
+    try testing.expectEqualSlices(u32, &[_]u32{  9, 11 }, list.items(.a));
+    try testing.expectEqualSlices(u8 , &[_]u8 { 10, 12 }, list.items(.b));
 }

 test "insert elements" {
@@ -803,7 +803,7 @@ test "insert elements" {
     try list.ensureUnusedCapacity(ally, 1);
     list.insertAssumeCapacity(1, .{ .a = 2, .b = 3 });

-    try testing.expectEqualSlices(u8, &[_]u8{ 1, 2 }, list.items(.a));
+    try testing.expectEqualSlices(u8 , &[_]u8 { 1, 2 }, list.items(.a));
     try testing.expectEqualSlices(u32, &[_]u32{ 2, 3 }, list.items(.b));
 }

@@ -895,7 +895,7 @@ test "sorting a span" {
         var n: u32 = 3;
         for (sliced.items(.chr)[i..j], sliced.items(.score)[i..j]) |chr, score| {
             try testing.expectEqual(score, n);
-            try testing.expectEqual(chr, c);
+            try testing.expectEqual(chr  , c);
             n += 1;
         }
         c += 1;

The proposed specification needs a bit more restrictions to avoid this obviously unintentional alignment:

fn foo() void {

    if (bar()) {
    if (baz()) {

            qux();

        }
        }
        }

or perhaps zig fmt's existing rules about removing blank lines would make sure this never happens? I think it's probably better to specify that lines with different indentations can never be aligned with each other.

If i wanted to opt into alignment for an array literal, is it even possible to insert blank lines in the right places?

test "" {
    var array = [_]u8{

        0, 1,  2,  3,  4,  5,  6,  7,
        8, 9, 10, 11, 12, 13, 14, 15,

    };
}

Will those blank lines be deleted by zig fmt? are we sure we want to require blank lines to get this alignment to work? Or are we proposing keeping the existing alignment rules for array literals, and adding additional rules?

As another exercise, I manually formatted this file: https://github.com/thejoshwolfe/legend-of-swarkland/blob/661e957e0b96159bcc43071ec3bacd882c84f1a9/src/server/game_model.zig using these modified rules, just to try this out:

Line groups are additionally delimited by a change in indentation level.
(same as above about token tags.) Additionally specify that comments don't count as tokens.

diff --git a/src/server/game_model.zig b/src/server/game_model.zig
index e6552ca..dcb55a0 100644
--- a/src/server/game_model.zig
+++ b/src/server/game_model.zig
@@ -5,13 +5,13 @@ const HashMap = std.HashMap;
 const core = @import("core");
 const Coord = core.geometry.Coord;

-const NewGameSettings = core.protocol.NewGameSettings;
-const Species = core.protocol.Species;
-const Floor = core.protocol.Floor;
-const Wall = core.protocol.Wall;
-const ThingPosition = core.protocol.ThingPosition;
-const EquipmentSlot = core.protocol.EquipmentSlot;
-const TerrainSpace = core.protocol.TerrainSpace;
+const NewGameSettings  = core.protocol.NewGameSettings ;
+const Species          = core.protocol.Species         ;
+const Floor            = core.protocol.Floor           ;
+const Wall             = core.protocol.Wall            ;
+const ThingPosition    = core.protocol.ThingPosition   ;
+const EquipmentSlot    = core.protocol.EquipmentSlot   ;
+const TerrainSpace     = core.protocol.TerrainSpace    ;
 const StatusConditions = core.protocol.StatusConditions;

 const map_gen = @import("./map_gen.zig");
@@ -30,7 +30,7 @@ pub const oob_terrain = TerrainSpace{
     .wall = .stone,
 };
 pub const Terrain = core.matrix.SparseChunkedMatrix(TerrainSpace, oob_terrain, .{
-    .metrics = true, // TODO: map generator should not use metrics
+    .metrics                 = true, // TODO: map generator should not use metrics
     .track_dirty_after_clone = true,
 });

@@ -48,11 +48,11 @@ pub const Individual = struct {
 };

 pub const ItemLocation = union(enum) {
-    floor_coord: Coord,
-    held: HeldLocation,
+    floor_coord: Coord       ,
+    held       : HeldLocation,
 };
 pub const HeldLocation = struct {
-    holder_id: u32,
+    holder_id       : u32          ,
     equipped_to_slot: EquipmentSlot,
 };
 pub const Item = struct {
@@ -73,47 +73,47 @@ pub const Item = struct {

 pub const StateDiff = union(enum) {
     individual_spawn: struct {
-        id: u32,
+        id        : u32       ,
         individual: Individual,
     },
     individual_despawn: struct {
-        id: u32,
+        id        : u32       ,
         individual: Individual,
     },
     individual_reposition: struct {
-        id: u32,
+        id  : u32          ,
         from: ThingPosition,
-        to: ThingPosition,
+        to  : ThingPosition,
     },
     individual_polymorph: struct {
-        id: u32,
+        id  : u32    ,
         from: Species,
-        to: Species,
+        to  : Species,
     },
     individual_status_condition_update: struct {
-        id: u32,
+        id  : u32             ,
         from: StatusConditions,
-        to: StatusConditions,
+        to  : StatusConditions,
     },

     item_spawn: struct {
-        id: u32,
+        id  : u32 ,
         item: Item,
     },
     item_despawn: struct {
-        id: u32,
+        id  : u32 ,
         item: Item,
     },
     item_relocation: struct {
-        id: u32,
+        id  : u32         ,
         from: ItemLocation,
-        to: ItemLocation,
+        to  : ItemLocation,
     },

     terrain_update: struct {
-        at: Coord,
+        at  : Coord       ,
         from: TerrainSpace,
-        to: TerrainSpace,
+        to  : TerrainSpace,
     },

     transition_to_next_level,

With the proposal as originally specified, only the first block in this diff would get changed--the imports from core.protocol.

I might be straying a bit too far from the original proposal with all these excercises, but it's been helpful for my understanding to visualize similar proposals before I form any opinion (at least I can pretend I haven't formed any opinion yet :smile:.). Maybe others will appreciate the exploration as well.

Just my thoughts on it, I think it should be an opt-in feature because I disagree with the premise that it makes thing easier to read, how people are comfortable reading their code is just dependent on their own preferences and that is not common across everyone. Additionally aligned code is harder to maintain (perhaps a moot point though if zig fmt was the thing maintaining it), but that at least is why I do not align code myself as something simple like adding a new line to a file may involve changes across many lines to fix up the alignment or whatever (if say the new line requires more padding across everything else).

Furthermore I do not think a computer can reliably determine what "good" alignment is, maybe in simple cases it can but alignment of stuff is fundamentally an aesthetic choice and what looks best will just depend on the person writing it, especially when it comes to math equations. For instance is this a "proper" alignment:

const foo =
  (a * b + c * d) *
  (z + f);

Or perhaps this may make more sense:

const foo =
  (a * b + c * d) *
      (z + f);

Just depends on if the + is important enough to explicitly align to convey some sort of meaning there mathematically which is beyond the compiler's ability to recognize.

Similarly there's multiple conventions in how arrays are aligned like this being fairly normal:

const foo = [_]u32 {
  1, 2, -3, 5, 7,
  -1, 2, 3, -1, -2,
};

Or maybe this (kinda ugly but I guess it's one way of thinking about it):

const foo = [_]u32 {
  1,  2, -3, 5,  7,
  -1, 2, 3,  -1, -2,
};

Or possibly something more standard like:

const foo = [_]u32 {
   1, 2, -3,  5,  7,
  -1, 2,  3, -1, -2,
};

Or keeping it fully consistent with the amount of padding each element requires like:

const foo = [_]u32 {
   1,  2, -3,  5,  7,
  -1,  2,  3, -1, -2,
};

All of those could be interpreted as valid in some context so I am not sure how the formatter would really decide and make people happy.

A few examples from zig's own codebase where, speaking from personal experience, alignment is kinda necessary to avoid bugs:

Those are all switch statements. Are we sure we want to embrace alignment in general? Or do we just want to enable it in switch statements?

The proposal doesn't even really help that much in those cases.

                    .builtin_extern     => try sema.zirBuiltinExtern(     block, extended),
                    .@"asm"             => try sema.zirAsm(               block, extended, false),
                    .asm_expr           => try sema.zirAsm(               block, extended, true),
                    .typeof_peer        => try sema.zirTypeofPeer(        block, extended),
                    .compile_log        => try sema.zirCompileLog(               extended),
                    .min_multi          => try sema.zirMinMaxMulti(       block, extended, .min),
                    .max_multi          => try sema.zirMinMaxMulti(       block, extended, .max),
                    .add_with_overflow  => try sema.zirOverflowArithmetic(block, extended, extended.opcode),
                    .sub_with_overflow  => try sema.zirOverflowArithmetic(block, extended, extended.opcode),

We need a different proposal to handle this situation.

Here is a refinement of the proposal to address some of the problems that have come to light with the snippets or real actual code discussed so far.

Problem: Floating punctuation looks really silly.

const a = b ; should always be const a = b; instead regardless of alignment.

Proposal:

Define a set of punctuation characters that should "stick to the left":

, .* .? } ] ) ;
: except when immediately following break or continue (a BreakLabel in the grammar)

This means:

// instead of this:
        .name       => try foo(value     .?, extra_value.*, x  , abc),
        .other_name => try f  (long_value.?, something  .*, xyz, a  ),

// you get this:
        .name       => try foo(value.?,      extra_value.*, x,   abc),
        .other_name => try f  (long_value.?, something.*,   xyz, z),

Some realistic examples:

// instead of this:
    try list.append(ally, .{ .tag = .keyword_const , .start =   0 });
    try list.append(ally, .{ .tag = .identifier    , .start =   6 });
    try list.append(ally, .{ .tag = .equal         , .start =  10 });
// you get this:
    try list.append(ally, .{ .tag = .keyword_const,  .start =   0 });
    try list.append(ally, .{ .tag = .identifier,     .start =   6 });
    try list.append(ally, .{ .tag = .equal,          .start =  10 });

// instead of this:
                size      : usize,
                size_index: usize,
                alignment : usize,
// you get this:
                size:       usize,
                size_index: usize,
                alignment:  usize,

// instead of this:
                sizes_bytes  [i] = elem.size      ;
                field_indexes[i] = elem.size_index;
// you get this:
                sizes_bytes  [i] = elem.size;
                field_indexes[i] = elem.size_index;

Problem: The token tag sequence rule is too sensitive

The original specification would not kick in for this block, because of the mismatched number of tokens:

     .mul           => try self.airMul(inst, .normal), 
     .mul_optimized => try self.airMul(inst, .fast), 
     .mul_wrap      => try self.airMulWrap(inst), 
     .mul_sat       => try self.airMulSat(inst),

The proposal as-is would simply not do any alignment on these, even though seeing the Mul aligned on all lines is desirable.

Proposal:

This is a refinement of the above subproposal. Instead of just naming some tokens that "stick to the left", group tokens into "token chunks".

Tokens that always form their own chunk, non-extensible:

roughly "infix" operators: &= ** *= *% *%= *| *|= ^ ^= .. ... = == => != < << <<= <<| <<|= <= -= -%= -| -|= -> % %= || |= + ++ += +% +%= +| +|= > >> >>= >= / /= sometimes - -% & * |
grouping start operators: { [ ( sometimes |

Tokens that form the center of an extensible chunk:

Identifiers
Keywords
Number literals
Character literals

Tokens that extend a chunk to the right:

roughly "suffix" operators: .* .?
grouping end operators: } ] ) sometimes |
sequence delimiters: , ;
name/type delimiter: sometimes :

Tokens that extend a chunk to the left:

roughly "prefix" operators: ! ? ~ sometimes - -% & *
"name prefix" operators: sometimes . :

Tokens always excluded from alignment:

Comments
String literals

To determine what to align in a sequence of lines, align all the corresponding token chunks for as long a sequence of "matching chunks" as you can starting at the beginning of the line. "Matching chunks" here means chunks that match whether they have an extensible center vs a non-extensible center. This is an approximation for whether the lines are roughly "doing the same shape of thing".

Then lines that get aligned with each other align the chunks:

If all corresponding chunks have number literal centers, align the right of the center.
Otherwise, align the left of all the centers.

Illustrative example:

const a = (10  +  2) >> 2;
var   b = (foo - 10) -  bar;
var   c = (foo () - 10) - bar; // mismatches after the function call
var   d = (1   +        1); // alignment from `const a` is still going, but starts a new group with the below.
var   e = (1   + 10000000);

Real examples:

     .mul           => try self.airMul    (inst, .normal), 
     .mul_optimized => try self.airMul    (inst, .fast), 
     .mul_wrap      => try self.airMulWrap(inst), 
     .mul_sat       => try self.airMulSat (inst),

         .lhs = try expr        (gz, scope, .{ .rl = .none }, node_datas[node].lhs), 
         .rhs = try comptimeExpr(gz, scope, .{ .rl = .{ .coerced_ty = .usize_type } }, node_datas[node].rhs), 
                                                   // ^ Here's where the mismatch starts

            .cmp_lt                       => try sema.zirCmp  (block, inst, .lt),
            .cmp_lte                      => try sema.zirCmp  (block, inst, .lte),
            .cmp_eq                       => try sema.zirCmpEq(block, inst, .eq, Air.Inst.Tag.fromCmpOp(.eq, block.float_mode == .Optimized)),
            .cmp_gte                      => try sema.zirCmp  (block, inst, .gte),
            .cmp_gt                       => try sema.zirCmp  (block, inst, .gt),
            .cmp_neq                      => try sema.zirCmpEq(block, inst, .neq, Air.Inst.Tag.fromCmpOp(.neq, block.float_mode == .Optimized)),

I propose that we cannot automatically align this the way the author here has done it:

                    .typeof_peer        => try sema.zirTypeofPeer(        block, extended),
                    .compile_log        => try sema.zirCompileLog(               extended),

The main drawback of this subproposal is its complexity both in specification and implementation. I do not volunteer to write the code that implements this :smile:, and I have low confidence that this proposal would be an improvement in all circumstances. This subproposal is the one I'm least confident about, and without this one, the rest of these subproposals still work together.

Problem: Alignment is distracting and unhelpful in most places

Here are some unnecessary hits for alignment from the examples above:

const meta      = std.meta;
const mem       = std.mem;
const Allocator = mem.Allocator;
const testing   = std.testing;

                size:       usize,
                size_index: usize,
                alignment:  usize,

In these cases the alignment is giving a tiny baby amount of maybe readability and then also incurs the significant drawback of diff noise that has been discussed many times before. The benefit does not outweigh the cost here.

(Aside: in my 4 years of working with Go code, I am quite confident that I find alignment in struct field declarations strictly negative value; it makes the code worse. Like what is the point of this alignment here? https://github.com/prometheus/client_golang/blob/d03abf3a31c973a5bc2c2dc698fb41b661a0f0c5/prometheus/graphite/bridge.go#L85-L96 . I don't want to distract the conversation too much talking about Go, but I am confident in my opinion that aligning struct declarations is negative value to the code. Readability is a highly subjective experience, and you are welcome to disagree with me on this (clearly the designers of go fmt disagreed with me.). However, this subjective bickering doesn't really matter that much.)

My appeal is that we need to overcome the diff noise counter argument in order to be in favor of this alignment. My opinion vs your opinion about readability and aesthetics doesn't overcome the objective, measurable diff noise problem. Preventing bugs could be a counter argument, and none of the data points from real actual use cases in this thread so far have provided any objective argument in favor of aligning field declarations or import declarations or really anything but switch statements, which brings me to the proposal:

Proposal:

Opt into alignment by putting // zig fmt: align in any scope, and it lasts until the indentation level changes. Changes in indentation level delimit line groups (rather than the start and end of the file as the original proposal suggested) in addition to blank lines.

We can see that people already communicate with zig fmt in the above examples from "zig's own codebase" by simply turning it off. Scoping the enablement of alignment to a syntactic block solves the awkward preprocessor-like usage here: https://github.com/ziglang/zig/blob/4f952c7e0e36dab15f9359f55eb8714f8fe92bcf/src/Sema.zig#L1204-L1209

Instead of turning zig fmt off and on, you could just say where you want the alignment.

 const val: Builder.Value = switch (air_tags[inst]) { 
     // zig fmt: align 
     .add            => try self.airAdd    (inst, .normal), 
     .add_optimized  => try self.airAdd    (inst, .fast), 
     .add_wrap       => try self.airAddWrap(inst), 
     .add_sat        => try self.airAddSat (inst), 

     .sub            => try self.airSub    (inst, .normal), 
     .sub_optimized  => try self.airSub    (inst, .fast), 
     .sub_wrap       => try self.airSubWrap(inst), 
     .sub_sat        => try self.airSubSat (inst),

Opt into alignment by putting // zig fmt: align in any scope, and it lasts until the indentation level changes. Changes in indentation level delimit line groups (rather than the start and end of the file as the original proposal suggested) in addition to blank lines.

This definitely should be optional, while I like idea of zig fmt making all code look the same, forcing alignment is a bit much. There is however another way of enabling this behaviour - if the chunk of code (delimited by line breaks) is not aligned, it stays the same, if it's even partially aligned (one extra space is enough) - zig fmt will align whole block:

// no change here
const a = 1;
const bbb = 10;

// this will be aligned
const c   = 11;
const dddddddd = 1;

It should be relatively easy to have some plugin implemented in editor to switch between the two (maybe will help of zig fmt or zls ?) making everyone happy.

@nissarin This would make it easier to opt-in (add a space instead of //zig fmt: align), but much harder to opt-out (delete all superfluous space from the entire block, if you've missed one zig fmt resets your progress).

@rohlem That is why I mentioned plugin in the first place and yes, you could argue that's not exactly dx friendly but at the same time I doubt anyone is using notepad to code in this day and age.

Alternatively this implicit behaviour could take into account only the first line of the block - heck, you could even extend it to make "virtual columns" based on context:

// include function call
     .sub            => try self.airSub    (inst, .normal), 
     .sub_optimized  => try self.airSub    (inst, .fast), 
     .sub_wrap       => try self.airSubWrap(inst), 
     .sub_sat        => try self.airSubSat (inst), 

// just the first "column"
     .sub            => try self.airSub(inst, .normal), 
     .sub_optimized  => try self.airSub(inst, .fast), 
     .sub_wrap       => try self.airSubWrap(inst), 
     .sub_sat        => try self.airSubSat(inst),

There is a gotcha there - if the first entry is already the "longest" you will end up with extra space all the way down (or just need to reorder them). Anyway I just think it should not be the default behaviour, looking at some examples is "eye opening experience" and peppering code with comments (pragmas) is also suboptional.

This is an interesting proposal, which, in a certain sense, aligns (a terrible pun) with the modus operandi of Zig. We are all used to looking at code that has sad hoc alignment (excluding syntactic forms, which provide anchor points); it may look ugly at first, but consider the following argument.

There is excellent quantitative justification for alignment. Provided that one can identify suitable definitions of blocks which should be subject to alignment, formatting of source code reduces the computational complexity of reading an aligned block of code to that of pattern recognition (which humans are quite good at, as we all know) followed by O(1) lookup (with a very small constant factor). Mentally, once one finds the pattern, one's brain will tend to cache this for re-use. Subsequent lookups are dirt cheap, which reduces churn in one's mental cache; this leaves open the possibility that one can store several patterns in one working session.

Compare the above to the present state (in essentially all languages): one must first (in one's mind) parse and tokenize a block of code, then find the pattern (if one exists!). Thus, one pays the time (and space, which is quite limited in the brain) complexity to comprehend a block of code. Ideally, one would pay this cost once per working session, but the reality is that we can hold only so many things in our short term working memory at once, hence, the longer the working session, the higher the probability that the cost is paid more than once.

There is yet another disadvantage to the present state: each time we re-examine the block of code, even if we remember the pattern, we pay the cost to parse and construct tokens, as the source code remains in its original form. One would be correct in arguing that once we have the pattern, the cost to parse + construct tokens is lower than in the original case; however, this cost is almost assuredly greater than O(1).

Note that I did not specify the computational complexity of the performed-by-human-brain parse, construct tokens and recognize pattern operation; it would depend on the complexity of the code block, but, in all cases, we can be sure that it is greater than O(1) (if we truly need a functional form, we could simply substitute the complexity of the parser + lexer which handles said code block). Alignment, as proposed, moves all that work to the machine. Now, we as humans need only recognize the pattern (which, in aligned form, is far easier to see) and then perform O(1) lookup.

I believe that the above argument serves as a counterweight to the added complexity of diffs (due to whitespace changes, which, as noted above, can be ignored anyway). If one puts aside historical preference, it becomes clear that alignment (in fact, aggressive alignment) is the superior quantitative choice. In essence, alignment offloads the inherent complexity of parsing + token construction to the machine, yielding a far easier problem for your human brain. The mental savings correspond to decreased working memory churn (hence, increased probability of connecting disparate thoughts) and lower energy to reach the same result (comprehension of a block of code); if one has an infinite working memory (no one does), then the increase in mental efficiency is still undeniable.

------------*---- Yes, yes, I am an interloper, but a well-meaning one.

I would propose a format file that is read by fmt. more like .clang-format It would be best, because one cannot please everyone.

ziglang / zig