ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.35k stars 2.36k forks source link

remove T{} syntax in favor of type coercion #5038

Open andrewrk opened 4 years ago

andrewrk commented 4 years ago

The following are semantically equivalent:

const a = T{};
const a: T = .{};
const a = @as(T, .{});

I propose to remove the first possibility, and rely on the other two.

The problem with this proposal as I can see it is inferred-size array literals:

const a = [_]u8{1, 2, 3};

There's no other way to write this. And if that works, then why wouldn't this work?

const a = [3]u8{1, 2, 3};

But now we're back to the proposal:

const a = [3]u8{1, 2, 3};
const a: [3]u8 = .{1, 2, 3};
const a = @as([3]u8, .{1, 2, 3});
JesseRMeyer commented 4 years ago

What problem does this solve?

const a = T{}; is the simplest expression of the three, in terms of concepts necessary to define the assignment semantics. .{} introduces anonymous structs, and @as() introduces a compiler 'function'.

I'm not convinced reducing the number of ways to define an assignment is worth removing the simplest solution to the problem, if I understand this correctly.

jakwings commented 4 years ago

How about this?

const a = T.{};

const a = [_]u8.{};
const a = [_]u8.{1, 2, 3};
emekoi commented 4 years ago

@iology see #760, specifically this part.

mogud commented 4 years ago
// I prefer this
const a: [_]u8 = .{1, 2, 3};
foobles commented 4 years ago

I would definitely be in favor of this, just as long as the . in .{} is removed at the same time, which seems to be in the works.

jibal commented 2 years ago

One could remove quoted string literal syntax since array literals could be used instead, or require all integer literals to be encoded in 0b<binary> form, but these are obviously carrying "only one way to do it" too far ... arguably this does too. And this proposal doesn't achieve the favored principle, since you still have

const foo: Foo = .{};

and

const foo = @as(Foo, .{});

And if there are any cases where @as would be required then I think that's a good reason to reject this proposal.

Speaking of which,

fn foo() !Foo {
    return .{ .field = try something() };
}

doesn't work because the compiler infers the type that isn't legal here (the error union), rather than the one that is (Foo), so with this proposal one must do either [first way to do it]

fn foo() !Foo {
    return @as(Foo, .{ .field = try something() });
}

or [second way to do it]

fn foo() !Foo {
    const result: Foo = .{ .field = try something() };
    return result;
}

Casts are to be avoided, temporary variables shouldn't be necessary, type inference is nice in some places but can be hard to read and understand in others. Foo{.field = ...} is clear and has none of the disadvantages of the other ways (its one disadvantage over the anonymous syntax is violation of DRY, but the anonymous syntax can be used when the tradeoffs favor it. Yes, tradeoffs ... sometimes there are good reasons to have more than one way to do things.)

jibal commented 2 years ago

@theInkSquid

I would definitely be in favor of this, just as long as the . in .{} is removed at the same time, which seems to be in the works.

It's not in the works ... #5039 was closed two days before your comment.

MKRhere commented 1 year ago

@jibal Both proposals are back on the table and I'm excited/hopeful for both again. :)

kuon commented 1 year ago

I think it should stay because of:

var foo = T{};
...
foo = T{} // make it clear what we are assigning.

I also do not see what we would gain from removing it.

misanthrop commented 1 year ago

This will make usage of anytype arguments inconvenient.

Example:

pub const Vec3 = @Vector(3, f32);

pub fn dot(a: anytype, b: @TypeOf(a)) std.meta.Child(@TypeOf(a)) {
  return @reduce(.Add, a*b);
}

test {
  _ = dot(@as(Vec3, .{1, 2, 3}), .{3, 2, 1}); // Looks ugly
  const a: Vec3 = .{1, 2, 3}; // Vec3{1, 2, 3} looks better
  const b: Vec3 = .{3, 2, 1};
  _ = dot(a, b);
}

With the change it will be always more convenient to use type arguments:

pub fn dot(comptime T: type, a: T, b: T) std.meta.Child(T) {
  return @reduce(.Add, a*b);
}

test {
  _ = dot(Vec3, .{1, 2, 3}, .{3, 2, 1}); // Looks OK?
  const a: Vec3 = .{1, 2, 3};
  const b: Vec3 = .{3, 2, 1};
  _ = dot(Vec3, a, b); // Vec3 here is redundant
}
misanthrop commented 1 year ago

Another problem is a tuple with typed elements:

const t = .{ .a = A{}, .b = B{} }; // works now, but will not work
const t = .{ .a: A = .{}, .b: B = .{} }; // syntax is not supported

 // that's how it will look
const t1 = .{ .a = @as(A, .{}), .b = @as(B, .{}) };
const t2 = .{ @as(A, .{}), @as(B, .{}) };

Maybe in some cases the tuple with tuple elements (just .{}) will coarse properly when used, but it might decrease readability, as @kuon pointed out earlier.

mlugg commented 9 months ago

Here's an idea of how to handle inferred-size array literals.

Currently, we have a special case to allow [_]T{ ... } syntax in array literals, where otherwise the [_]T part would be considered a normal type. I propose that we move this special case: rather than array literals, this syntactic form is permitted as a type annotation for a const or var decl (global or local). Like today, it is an exact syntactic form: for instance, const x: ([_]T) = ... is invalid just as ([_]T){ ... } is invalid today. When a const/var is marked with this "type", the initialization expression is given a new result location type. In terms of implementation, it will look something like this:

/// This expression is the initialization expression of a var decl whose type is an inferred-length array.
/// Every result sub-expression must use array initialization syntax. The array's length should be written
/// to `chosen_len` so the caller can retroactively set the array length.
inferred_len_array_ptr: struct {
    /// The array pointer to store results into.
    ptr: PtrResultLoc,
    /// This is initially `null`, and is set when an expression consumes this result location.
    /// If an expression has a length which does not match the currently-set one, it can use `src_node` to emit an error.
    chosen_len: *?struct {
        len: u32,
        src_node: Ast.Node.Index,
    },
},

The idea here is that every peer here must be an array initialization expression (.{ ... }), and their lengths must match. The var/const decl will create an alloc instruction for an array type whose length is rewritten to the correct value after lowering the init expression.

This result location type will trigger an error in all cases other than array initializers, such as struct inits and calls through to rvalue.

In practice, here's what this means:

// these are all valid
const x: [_]u8 = .{ 1, 2, 3 };
const y: [_]u8 = if (condition) .{ 1, 2 } else switch (x) {
    .foo => .{ 3, 4 },
    .bar => .{ 5, 6 },
    else => .{ 7, 8 },
};
const z: [_][]const u8 = blk: {
    if (foo) break :blk .{ "hello", "world" };
    break :blk .{ "foo", "bar" };
};

// this is invalid
// error: array length cannot be determined
// note: result must be array initialization expression
const a: [_]u8 = @as([3]u8, .{ 1, 2, 3 });
const b: [_]i16 = blk: {
    const result: [2]i16 = .{ 1, 2 };
    break :blk result;
};
const c: [_]u8 = if (cond) .{ 1, 2 } else something_else;

// this is also invalid
// error: array length '3' does not match array length '2'
// note: array with length '2' here
// note: inferred-length array must have a fixed length
const d: [_]u8 = if (cond) .{ 1, 2 } else .{ 3, 4, 5 };
DerpMcDerp commented 2 months ago

cpp2 uses the following syntax:

name: type = expr;

and allows you to omit at most one:

name := expr; // define name, type is inferred
name: type;   // define name, initialized later
:type = expr; // create anonymous r-value

so if :type = expr syntax is borrowed from cpp2 you can remove both @as and T{} from Zig:

:[3]u8 = .{1, 2, 3}
@as([3]u8, .{1, 2, 3}) // removed
[3]u8{1, 2, 3} // removed
arthurmelton commented 2 months ago

If you were to follow up on this, and require the type annotation, how about functions like ArrayLists? In my mind these functions feel similar with the type being on the right side.

const a = [0]T{};
const a = std.ArrayList(T).init(std.heap.GeneralPurposeAllocator);

If you want to try to remove the T{} syntax, the way to write the code would be the following:

const a: [0]T = .{};
const a = std.ArrayList(T).init(std.heap.GeneralPurposeAllocator);

To me, this just feels wrong, as both functions don't feel like they have the same “way” of writing them. I feel like if you want to remove the type annotation from the right, then you should maybe try to remove it from all functions like ArrayList and alloc.

castholm commented 2 months ago

@arthurmelton See #9938 (decl literals), which would enable

const a: [0]T = .{};
const a: std.ArrayList(T) = .init(allocator);
mlugg commented 2 months ago

I'd like to make one more argument against T{ ... } syntax.

Nowadays, the only place I ever really write T{ ... } myself is for certain APIs in the compiler. When doing DOD tricks, we want to be able to unpack arbitrary structs into a big flat array of values stored elsewhere (generally named extra). We have helper functions called addExtra to do this; so, we write code like foo.addExtra(TypeToStore{ ... }).

The problem here is that this is actually kind of type-unsafe. What would happen if I instead wrote foo.addExtra(.{ ... }) by mistake? Well, the function would be passed an anonymous struct which has a potentially different field order. It would probably compile fine, but if the field order were different, this would cause bugs down the line, since we would unpack values from the extra array in the wrong order. After discussing this a little with @silversquirl, it led us to a key observation.

Any given API generally expects a typed struct or an anonymous struct, rather than accepting either. Passing a typed struct where an anonymous struct is expected, or vice versa, is likely to lead to bugs.

At first glance, it may seem that if anything, this is an argument in favor of T{ ... } syntax: it's a separate form for when we want to use typed structs. The problem is that this difference is completely superficial. There's nothing to actually force you to write one over the other, so -- unless the API is doing its own weird type checks of some kind -- it doesn't really prevent this bug from slipping in.

Now, suppose that this proposal was implemented, so that .{ ... } became the only syntax to directly initialize a struct. What would these APIs look like then? A function argument which expects an arbitrary "concrete" struct type would take parameters comptime T: type, x: T. So my call above would become foo.addExtra(TypeToStore, .{ ... }), which is impossible to mess up. An API which expects an anonymous struct (something like std.Build.dependency) continues to take anytype; you can't get the syntax wrong, because there's only one way to init the struct, and reaching for @as in this context should be a pretty solid sign that you're doing something wrong. In essence, under this proposal, the form of an API implicitly prevents you from using it incorrectly by passing the wrong "kind" of struct.

Flaminator commented 2 months ago

@mlugg

I assume this addExtra api is using anytype as it's parameter? I see this more of an issue with anytype and anonymous struct literal syntax than with T{ ... }. Even when removing the T{ ... } syntax you will still run into this issue you described.

You can that for example in the following code:

const S2 = struct{
    x: i32,
    y: i32,
};

fn what_is(x: anytype) void
{
//  do stuff here
    _ = x;
}

fn any_test(i: i32) void
{
    const s1 = S2{.x = i+0, .y = i+1};
    const s2 = .{.x = i+2, .y = i+3};

    what_is(s1);                 // 1
    what_is(s2);                 // 2
    what_is(S2{.x=i+4, .y=i+5}); // 3
    what_is(.{.x=i+6, .y=i+7});  // 4
    what_is(.{.y=i+8, .x=i+9});  // 5
}

The calls 1 and 3 are the same, the calls 2 and 4 are the same and call 5 is different. So the difference will still be there, the only thing that will probably change is that people are more likely to either use @as or just have their code accidentally do the wrong thing because they are playing around with anonymous structs instead of a real typed struct.

Removing T{ ... } syntax also is a problem in the following piece of code when switching the parameter type of p1 or p2 in abcdef to S2 or S1. Here it's actually the anonymous struct literal syntax that will just compile fine when this could probably result in bugs:

const S1 = struct{
    x: i32,
    y: i32,
};

const S2 = struct{
    x: i32,
    y: i32,
};

fn abcdef(p1: S1, p2: S2) void
{
//  do stuff here
    _ = p1;
    _ = p2;
}

// Today
fn ghi() void
{
    const s1 = S1{.x = 1, .y = 2};     // const s1: S1 = .{.x = 1, .y = 2};
    const s2: S2 = .{.x = 3, .y = 4};  // const s2 = S2{.x = 3, .y = 4};

    // Will silently compile parameter type change     | 1 | 2 |
    abcdef(S1{.x = 1, .y = 2}, S2{.x = 3, .y = 4}); // | N | N |
    abcdef(.{.x = 1, .y = 2}, .{.x = 3, .y = 4});   // | Y | Y |
    abcdef(.{.x = 1, .y = 2}, S2{.x = 3, .y = 4});  // | Y | N |
    abcdef(S1{.x = 1, .y = 2}, .{.x = 3, .y = 4});  // | N | Y | 
    abcdef(s1, S2{.x = 3, .y = 4});                 // | N | N |
    abcdef(s1, .{.x = 3, .y = 4});                  // | N | Y |
    abcdef(S1{.x = 1, .y = 2}, s2);                 // | N | N |
    abcdef(.{.x = 1, .y = 2}, s2);                  // | Y | N | 
    abcdef(s1, s2);                                 // | N | N |
}

// Tomorrow
fn jkl() void
{
    const s1: S1 = .{.x = 1, .y = 2};
    const s2: S2 = .{.x = 1, .y = 2};

    // Will silently compile parameter type change   | 1 | 2 |
    abcdef(.{.x = 1, .y = 2}, .{.x = 3, .y = 4}); // | Y | Y |
    abcdef(s1, .{.x = 3, .y = 4});                // | N | Y |
    abcdef(.{.x = 1, .y = 2}, s2);                // | Y | N |
    abcdef(s1, s2);                               // | N | N |
}

or you would have to start littering your code with @as calls everywhere everytime but that doesn't improve readability at all but does make your code "correct".

So imo your example is not really a valid argument against removing T{ ... } syntax.

expikr commented 1 month ago

If @as is so indispensable as to displace existing language syntax, why not simply unify it as part of the language as opposed to being a builtin function?

I propose incorporating @as into the language itself via a streamlined T.{} dot syntax that is unified between primitive and compound types:

const a: i32 = 1;
const b = i32.{1};

const c: [3]i32 = .{1, 2, 3};
const d = [3]i32.{1, 2, 3}; // or [_]i32.{1, 2, 3};

const e: Vec2 = .{.x=1, .y=2};
const f = Vec2.{.x=1, .y=2};

const NativeFloat = if (8==@sizeOf(usize)) f64 else f32;
const g = NativeFloat.{0.25};

The three disparate syntax T{}, [n]T{} and @as(T,.{}) are removed and replaced with a single cohesive T.{} syntax.

expikr commented 1 month ago

Here's an idea of how to handle inferred-size array literals.

Currently, we have a special case to allow [_]T{ ... } syntax in array literals, where otherwise the [_]T part would be considered a normal type. I propose that we move this special case: rather than array literals, this syntactic form is permitted as a type annotation for a const or var decl (global or local). Like today, it is an exact syntactic form: for instance, const x: ([_]T) = ... is invalid just as ([_]T){ ... } is invalid today. When a const/var is marked with this "type", the initialization expression is given a new result location type. In terms of implementation, it will look something like this:


/// This expression is the initialization expression of a var decl whose type is an inferred-length array.

/// Every result sub-expression must use array initialization syntax. The array's length should be written

/// to `chosen_len` so the caller can retroactively set the array length.

inferred_len_array_ptr: struct {

    /// The array pointer to store results into.

    ptr: PtrResultLoc,

    /// This is initially `null`, and is set when an expression consumes this result location.

    /// If an expression has a length which does not match the currently-set one, it can use `src_node` to emit an error.

    chosen_len: *?struct {

        len: u32,

        src_node: Ast.Node.Index,

    },

},

The idea here is that every peer here must be an array initialization expression (.{ ... }), and their lengths must match. The var/const decl will create an alloc instruction for an array type whose length is rewritten to the correct value after lowering the init expression.

This result location type will trigger an error in all cases other than array initializers, such as struct inits and calls through to rvalue.

In practice, here's what this means:


// these are all valid

const x: [_]u8 = .{ 1, 2, 3 };

const y: [_]u8 = if (condition) .{ 1, 2 } else switch (x) {

    .foo => .{ 3, 4 },

    .bar => .{ 5, 6 },

    else => .{ 7, 8 },

};

const z: [_][]const u8 = blk: {

    if (foo) break :blk .{ "hello", "world" };

    break :blk .{ "foo", "bar" };

};

// this is invalid

// error: array length cannot be determined

// note: result must be array initialization expression

const a: [_]u8 = @as([3]u8, .{ 1, 2, 3 });

const b: [_]i16 = blk: {

    const result: [2]i16 = .{ 1, 2 };

    break :blk result;

};

const c: [_]u8 = if (cond) .{ 1, 2 } else something_else;

// this is also invalid

// error: array length '3' does not match array length '2'

// note: array with length '2' here

// note: inferred-length array must have a fixed length

const d: [_]u8 = if (cond) .{ 1, 2 } else .{ 3, 4, 5 };

I would advise making this suggestion into its own separate proposal to potentially expedite its acceptance. Regardless of whether the parent proposal will be accepted or not, I think moving the inferred array notation to type annotations is a significant improvement in its own right.

Fri3dNstuff commented 2 weeks ago

After originally being very opposed to the proposal I've come to like it quite a bit... Here is my argument in favour of this proposal, that I haven't seen mentioned so far:

Though using type coercion for aggregate types is quite alien at first (especially coming from the many languages that require the programmer to mention a type's name in order to create a value thereof), if you think about it for some time, you'll see that we are already using this coercion system quite a bit:

integer literals have type comptime_int, we use coercion to convert them to usize, i32, etc.

float literals have type comptime_float, we use coercion to convert them to f64, f32, etc.

string literals have types *const [N:0]u8 for some N, we use coercion to convert them to []const u8.

.{} can be thought of as the aggregate value literal.