ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.59k stars 2.53k forks source link

Proposal: A new formulation of anytype-based generics #9272

Open agathazeren opened 3 years ago

agathazeren commented 3 years ago

Proposal: A new formulation of anytype-based generics

In situations other than fn f(x: anytype) usize { ... }, the semantics of anytype can be quite confusing, and how potential features, such as *anytype and []anytype do not have clear semantics. This proposal attempts to address this by setting forth a new formulation of anytype-style generics.

Functions

The first major change in this proposal is the removal of comptime parameters. Instead, either entire functions may be comptime, and there are two new parameter-level attributes: inline and inline_type, and comptime is a function-level attribute.

comptime fn

comptime fn Node(T: type) type {
    return struct {
        elem: T,
        next: *@This(),
    };
}

comptime fns are functions that can only be called at compile time, and as such, they may take parameters of comptime-only types, such as type or comptime_int, and return comptime-only types. They are never emitted.

inline parameter attribute

fn eql(inline T: type, a: []T, b: []T) bool {
    if (a.len != b.len) return false;
    for (a) |elem, i| {
        if (elem != b[i]) return false;
    }
    return true;
}

inline causes a function to be emitted many times in the generated code, one for each value passed in, similar to Rust f<T>() or f<const I>(), when called at runtime. inline parameters may contain comptime-only types, or they may contain normal types.

inline_type parameter attribute

fn print(inline format: []const u8, inline_type args: any) void { ... }

inline_type causes a function to be emitted once for each type it is called with, will the value is passed at runtime, similar to f<T>(t: T) in Rust. It can only be used on prototype types, the most common of which is any.

any

Though any is very similar to the existing anytype, it is a different concept than anytype, and I don't like anytype as a name.

Grammar-wise, any is the same kind of special identifier as type.

comptime types

comptime struct Options {
    meta: type,
    level: log.Level,
    errors: type,
}

structs, enums, and unions can be made comptime with a prefix keyword attribute. If field of a struct or union, or the tag type of a union, is comptime-only, then the type must be marked as comptime, though one of these need not be present for a type to be marked comptime. A slice of, array of, error union of, or pointer to a comptime-only type is implicitly comptime-only.

const AnyPair = comptime struct {
    a: any,
    b: any,
};
fn main() void {
    var p = AnyPair {.a = "hi", .b = 3 };
    p.a = p.b;
    p.b = [_]u32{1,2,3};
}

When any is used as a field, it is a comptime-only type, and can be changed to values of any type.

prototype types.

const ExternalEndpoint = prototype struct {
    inline log_level: log.level,
    inline_type connection: any,
    cache: Cache
}

prototype types are a new concept introduced by this proposal. If any struct field is marked inline or inline_type, then the type must be marked prototype. Additionally, any is a prototype type. enums and unions cannot be marked prototype. A slice, array, pointer, or error union of a prototype type is They cannot be used as a normal parameter, only as a parameter to a comptime function, or an inline or inline_type parameter (and, only prototype types may be behind a inline_type parameter). As a parameter to a comptime function, the prototype must not be behind a non-const pointer or slice, but otherwise it works just like any other comptime-only type, without the inline or inline_type directives. Behind a inline_type parameter attribute, all the inline and inline_type attributes work as though they were just on parameters to the function. Behind a inline directive, all fields work as though they were behind an inline directive.

EDIT: Originally there was a different example here, but I found a number of errors in it, so I've replaced it with a new and better example.

const P = prototype struct {
    inline x: usize,
    inline_type y: any,
    z: usize,
}

fn f(inline a: P, inline_type b: P, c: bool) void { 
    print(
        "{} {} {} {} {} {} {} {}", 
        a.x, a.y. a.z, b.x, b.y, b.z, c,
    );
}

fn main() void {
    const j = P {
        .x = 3,
        .y = 3.14,
        .z = 4,
    };
    const k = P {
        .x = 5,
        .y = "hi",
        .z = 6,
    };
    f(j,j, true);
    f(j,j, false);
    f(j,k, true);
    f(j,k, false);
    f(k,j, true);
    f(k,j, false);
    f(k,k, true);
    f(k,k, false);
}

Compiles to something represented by the pseudocode:


const P__0 = struct {
    y: f64,
    z: usize,
};
const P__1 = struct {
    y: []const u8,
    z: usize,
};

fn f__0(b: P__0, c: bool) void {
    print(
        "{} {} {} {} {} {} {} {}", 
        3, 3.14. 4, 3, b.y, b.z, c,
    );
}
fn f__1(b: P__1, c: bool) void { 
    print(
        "{} {} {} {} {} {} {} {}", 
        3, 3.14. 4, 5, b.y, b.z, c,
    );
}
fn f__2(b: P__0, c: bool) void { 
    print(
        "{} {} {} {} {} {} {} {}", 
        5, "hi". 6, 3, b.y, b.z, c,
    );
}
fn f__3(b: P__1, c: bool) void { 
    print(
        "{} {} {} {} {} {} {} {}", 
        5, "hi". 6, 5, b.y, b.z, c,
    );
}

fn main() void  {
    const j = P__0 {
        .y = 3.14,
        .z = 4,
    };
    const k = P__1 {
        .y = "hi",
        .z = 6,
    };
    f__0(j, true);
    f__0(j, false);
    f__1(k, true);
    f__1(k, false);
    f__2(j, true);
    f__1(k, false);
    f__3(k, true);
    f__1(k, false);
}

Since P is a prototype struct, you cannot create a non-comptime variable of it:

var p: P = P{ .x = 3, .y = "hi", .z = 4 }; // Error
inline x: usize,

An inline field means that a new version of the struct is emitted for each value inside the field. So for instance here P__0 corresponds to .x = 3.

inline_type y: any,

An inline_type field means that a new version of the struct is emitted for each type of the field. Only prototype types (including any) may be the field type for inline_type fields. In this example, P__0 corresponds to @TypeOf(y) = f64.

z: usize,

z is an ordinary field, and works as such.

Type Form.

The is_comptime field in TypeInfo.Struct is replaced with form: Form field.

const Form = enum {
    @"comptime",
    @"prototype",
    normal,
};

EDIT: Add this. In addition, there is a new function in std.meta for finding the form of any type:

comptime fn form(T: type) Form { ... }

Usage Example: Iterators (and other typeclasses, by a similar means)

comptime fn AnyIterator(Elem: type) type {
    return prototype struct {
        inline impl: Iterator(Elem, any),
        inline_type state: any,

        fn next(self: @This()) ?self.impl.Out {
            return self.impl.next(&self.state);
        }
    };
}

comptime fn Iterator(Elem: type, State: type) type {
    return comptime struct {
        next: fn(*State) ?Elem,
        is_empty: fn(*State) bool,
        count_estimate: fn(*State) usize = defaultCountEstimate,
        // The iterator typeclass could be expanded, or reduced.

        fn defaultCountEstimate(_: *State) usize {
            return 0;
        }

    }
}

const Counting = struct {
    from: usize,
    to: usize,

    fn toIterator(self: Counting) AnyIterator(usize) {
        return AnyIterator {
            .impl = Iterator(usize, Counting) {
                .next = next,
                .is_empty = isEmpty,
            },
            .state = self,
        };
    }

    fn next(self: *Counting) ?usize {
        if (self.from <= self.to) {
            var val = self.from;
            self.from += 1;
            return val;
        }
        return null;
    }

    fn isEmpty(self: *Counting) bool {
        return self.from > to;
    }

}

fn usesIterator(inline_type it: *AnyIterator(usize)) void {
    while(it.next()) |val| {
        debug.print("> {}\n", .{val});
    }
}

fn main() void {
    var it = Counting{.from = 0, .to = 10}.toIterator();
    usesIterator(&it);
}
InKryption commented 3 years ago

I think adding 3 new keywords ( any, prototype, inline_type ) is a lot to solve the problem at hand, given the proposed keywords' scope. I also think that overloading the meanings of comptime and inline adds a bit of mental overhead, especially when you consider the ambiguity that const f = comptime fn(...) would introduce when #1717 is accepted. Basically, I think this is adding a lot more things to remember, without solving the anytype problems that have been discussed, in any more of an elegant way than status-quo. And personally, I think the example you've given is kind of confusing. Not unreadable, but the use of the keywords prototype, inline, inline_type, and any in the AnyIterator is a very loaded set of information.

agathazeren commented 3 years ago

I think adding 3 new keywords ( any, prototype, inline_type ) is a lot to solve the problem at hand, given the proposed keywords' scope. I also think that overloading the meanings of comptime and inline adds a bit of mental overhead,

This propsal actually removes an overloading of comptime, because existing comptime parameters serve both what in this propsal is a comptime fn and what in this proposal is a inline parameter. In addition, anytype currently works both as any in a comptime context and any in an inline_type context, which I find is quite a confusing overloading. I chose the keyword inline to be similar to inline for, but another keyword, such as mono or specialize could be better.

especially when you consider the ambiguity that const f = comptime fn(...) would introduce when #1717 is accepted.

Good point (and this actually doesn't need #1717). I propose requireing the existing comptime expressions to require a block to solve this problem.

Basically, I think this is adding a lot more things to remember, without solving the anytype problems that have been discussed, in any more of an elegant way than status-quo.

Which of the anytype problems (that are not solved at all by the status quo) do you think are not solved in this proposal?

InKryption commented 3 years ago

Which of the anytype problems (that are not solved at all by the status quo) do you think are not solved in this proposal?

I'd like to make clear I don't think this can't solve the same problems as anytype, but I don't think this solves them any better than anytype currently does.

E.g., in fn print(inline format: []const u8, inline_type args: any) void { ... } from your proposal, inline just substitutes status-quo comptime, and inline_type args: any just substitutes anytype, and I don't find either of these substitutions to enhance the readability or the usability. And as for the proposal-given Iterator example, I'm certain the same or very similar could be accomplished in current Zig, using anytype comptime type checks and/or @fieldParentPtr. I'm not in a very good position to try this right now, but I may when I have time.

Having said that, I'd then like to ask what do you believe this proposal solves? Because so far, all I can really discern is some extra syntactic explicitness. Maybe I'm missing something that you can see better.

agathazeren commented 3 years ago

I'd like to make clear I don't think this can't solve the same problems as anytype, but I don't think this solves them any better than anytype currently does.

I agree in the most basic cases this does not solve the problem any better than anytype, but the current anytype design ends up with many problems when you try to build more complicated examples, it completely falls apart, not just in that it does not allow certain constructs, but in the the abstract model it is trying to provide does not hold up. This propsald doesn't, as far as I have stretched it (not saying there aren't holes I haven't found, just that it can at least go much farther than current anytype)

E.g., in fn print(inline format: []const u8, inline_type args: any) void { ... } from your proposal, inline just substitutes status-quo comptime, and inline_type args: any just substitutes anytype, and I don't find either of these substitutions to enhance the readability or the usability.

While I admit in the case of print, it becomes slight more verbose, there are distinctions that are impossible to make in the current anytype, such as the difference between these two structs below. StructField can only exist at compile time, but Take is a runtime struct that causes functions it is passed to to be monomorphized more times.

const StructField = comptime struct {
    name: []const u8,
    field_type: type,
    default_value: ?any,
}
const Take = prototype struct {
    take: usize,
    inline_type inner: any,
}

Back to the print example, also note that the substitution of inline for comptime adds an element of clarity. For instance, the follwing two functions both would take comptime parameters in current zig, but have very different meanings.

fn print(comptime string: []const u8, args: anytype) !void { … } // exists many times in generated code, called at runtime
fn ArrayList(comptime El: type) type { … } // cannot exists in generated code, only called at compile time
⇒
fn print(inline string: []const u8, inline_type args: any) !void { … } // now the two different effects have different notations.
comptime fn ArrayList(El: type) type { … }

And as for the proposal-given Iterator example, I'm certain the same or very similar could be accomplished in current Zig, using anytype comptime type checks and/or @fieldParentPtr. I'm not in a very good position to try this right now, but I may when I have time.

If you can figure out how to do a nominal interface like this with static dispatch, I would be very interested in seeing how you can do it. I've tried several different approaches and they all fall apart.

Having said that, I'd then like to ask what do you believe this proposal solves? Because so far, all I can really discern is some extra syntactic explicitness. Maybe I'm missing something that you can see better.

To me it clears up the quite mixed up semantics of anytype, which acts very differently in different contexts while pretending to be a unified concept. Here, we separate and name these distinct concepts, and give the programmer full access to them to build better abstractions.

ghost commented 3 years ago

@agathazeren Could you explain in more detail how inline fields in prototype structs are supposed to work?

agathazeren commented 3 years ago

Could you explain in more detail how inline fields in prototype structs are supposed to work?

I have updated the post to include a better explanation of prototype structs. Hopefully it is more clear now. If not, please ask again.

ghost commented 3 years ago

Edit: The following series of comments is based on a misconception I had about the Zig type system and is mostly irrelevant to this proposal.

~What I still don't get is this: inline_type monomorphizes on type only, with the value unknown until runtime. inline, on the other hand, monomorphizes on the actual value, which is then required to be known at comptime. In my understanding, this should preclude the use of anything inline in an inline_type context, including prototypes. So I guess my question is, how would the following example work? And in particular, what does g get lowered to?~

const P = prototype struct { inline x: usize };
const S = struct { x: usize };

fn f(inline_type a: P) usize {
    return a.x;
}

fn g(a: S) usize {
    return f(a);
}
agathazeren commented 3 years ago

Ah. That code would not compile, as you can't pass an S to a parameter that requires a P. prototype struct don't work as structural interfaces, but are types themselves. Instead you would write:

fn g(a: usize) usize {
    return f(P{.x = a}); // Error
}

However this still has a problem, because since P.x is inline, it must be comptime known, and a is not guaranteed to be comptime-known.

ghost commented 3 years ago

That code would not compile, as you can't pass an S to a parameter that requires a P. prototype struct don't work as structural interfaces, but are types themselves.

~My bad. Would the below summary describe the semantics correctly?~

~Prototypes pull double duty as templates for the creation of more specialized structs (which then guide the specialization of functions they are passed to), and they are also first-class patterns that specialized structs can be matched against in type signatures.~

~Matching behavior:~ ~ Normal fields in a prototype have to match one-to-one.~ ~ inline_type fields match fields with the same name and either any type or a subset of types spefified by another Prototype.~ ~ inline fields match constants* with the same name and type (that's what confused me initially).~

~So in the example from the OP,~

const P = prototype struct {
    inline x: usize,
    inline_type y: any,
    z: usize,
}

~P would match struct { const x: usize = 5; y: usize, z: usize }, or struct { const x: usize = 0, y: f64, z: usize }, but not struct { y: usize, z: usize } or struct { const x: usize = 5; y: usize, z: usize, other: usize }.~

~Specialization behavior:~ ~Initializing a prototype will generally produce and initialize a more narrow type:~

var j = P { .x = 3, .y = @as(f64, 3.14), .z = 4 };

// desugars to:
const P_3_f64 = struct { const x: usize = 3; y: f64, z: usize };
var j = P_3_f64 { .y = 3.14, .z = 4 };

~Just like with a mixed-time functions, the initializer value for x must be comptime-known, as must be the type of y, while the values of y and z can be runtime known (e.g. supplied from the parameters of a runtime function).~

agathazeren commented 3 years ago

Not really. prototype is not about structurally matching other types. It is about creating a single type, with a nominal identity, that can have a mix of comptime-only and runtime-only values. protoype structs can be better thought of as decomposing at the function level. So,

const P = prototype struct {
    inline x: usize,
    inline_type y: any,
    z: usize,
}

fn f(a: f64, inline_type p: P) void {}

decomposes into something like

fn f(a: f64, inline p.x: usize, inline_type p.y: any, p.z: usize) void {}
ghost commented 3 years ago

It is about creating a single type, with a nominal identity.

~Ok, I'm officially confused.~

~Presumably you know that Zig's type system is structural to the bone. So, in what exact sense are prototype structs "nominal"?~

const P = prototype struct { x: usize, };
const Q = prototype struct { y: usize, };

~Would you consider P and Q to be interchangeable, or are they distinct types?~

ghost commented 3 years ago

~Gah, I meant the fields to have the same name, of course. The corrected example is~

const P = prototype struct { x: usize, };
const Q = prototype struct { x: usize, };

~Otherwise the question doesn't even make sense.~

ghost commented 3 years ago

~I think I get it now. Just to make sure, here are some more cases:~

const R = prototype struct { inline x: usize };
const S = prototype struct { inline x: usize };

const r0 = R{ .x = 0 };
const r1 = R{ .x = 1 };
const s0 = S{ .x = 0 };

const T = prototype struct { inline_type x: any };
const U = struct { x: usize };

const t = T{ .x = @as(usize, 1) };
const u = U{ .x = 1 };

~1. R == S: True - prototypes are distinguished only by their structure.~ ~2. @TypeOf(r0) == @TypeOf(s0): True - identical specializations of identical prototypes give same type.~ ~3. @TypeOf(r0) == @TypeOf(r1): Not sure - on one hand, different specializations trigger monomorphization of functions, but you say that they are still considered the same type?~ ~4. t == u: False - prototype specializations are considered distinct from ordinary structs, even if they are structurally identical. That's the "nominal" part.~

~Is that correct?~

agathazeren commented 3 years ago
  1. R != S

    test {
    const S = struct { x: usize };
    const R = struct { x: usize };
    
    try testing.expect(comptime S != R);
    }

    Just as normal structs have their own nominal identities, prototype structs do too.

  2. @TypeOf(r0) != @TypeOf(s0) They are from different prototypes.

  3. @TypeOf(r0) != @TypeOf(r1) const specializes the type.

  4. t != u T and U have different nominal identities.

ghost commented 3 years ago

Wow, I've managed to hold such a deep misconception about the Zig type system for quite a while, without it ever getting in my way. I implicitly assumed from things like anonymous structs and computed generics that Zig has structural types. But you're absolutely right, structs are nominal and ArrayList(T) only works because all comptime calls are memoized. Thanks for clearing this up! And sorry for the noise.

ghost commented 3 years ago

If you can figure out how to do a nominal interface like this with static dispatch, I would be very interested in seeing how you can do it. I've tried several different approaches and they all fall apart.

This is only slightly above duck-typing in terms of rigor, but the intent to implement a certain nominal interface could be encoded in status-quo Zig like this:

const std = @import("std");

// Creates a zero-sized tag to identify the interface
fn AnyIterator(comptime Elem: type) type {
    return struct { const Elem = Elem; };
}

const Counting = struct {
    // Signals use of the interface
    impl_AnyIterator: AnyIterator(usize) = undefined,

    from: usize,
    to: usize,

    fn next(self: *Counting) ?usize {
        if (self.from <= self.to) {
            const val = self.from;
            self.from += 1;
            return val;
        }
        return null;
    }

    fn isEmpty(self: *Counting) bool {
        return self.from > self.to;
    }
};

fn usesIterator(it: anytype) void {
    // Trigger error if interface is not "implemented"
    if (@TypeOf(it.impl_AnyIterator) != AnyIterator(usize)) {
        @compileError("Type does not implement AnyIterator(usize)");
    }

    while(it.next()) |val| {
        std.debug.print("> {}\n", .{val});
    }
}

pub fn main() void {
    var it = Counting {.from = 0, .to = 10};
    usesIterator(&it);
}

A more robust solution would be to define a isAnyIterator function that checks for the presence of the required fields and methods as well (using @TypeInfo and/or std.meta). But that would admittedly be very messy.


Edit: In the original version I forgot to actually check the type of the "interface" field. This is now fixed. But I've also thought of a better solution:

const Counting = struct {
    pub fn implements(_: *@This(), comptime Interface: type) void {
        switch (Interface) {
            AnyIterator(usize) => {},
            // more interfaces, if necessary
            else => @compileError("Interface not implemented"),
        }
    }
    //...
}

fn usesIterator(it: anytype) void {
    it.implements(AnyIterator(usize));
    // ...
}
agathazeren commented 3 years ago

That second solution is probably the best that I've seen. The two main drawbacks I see is that it is note extensible and should two different interfaces require the same method name, they cannot both be implemented.

Regardless, I still think this proposal is useful for clearing up other inconsistencies with anytype.