ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.65k stars 2.47k forks source link

Proposal: Type variables #4821

Closed Qix- closed 3 years ago

Qix- commented 4 years ago

Zig seems to be set up to support generic functions quite well, but is lacking some of the flexibility or expressiveness I'd personally hope for. Currently, you can use param: var (or, soon to be, param: anytype if #4820 lands) and then @TypeOf to infer the types.

fn add(a: var, b: @TypeOf(a)) @TypeOf(a) {
    return a + b;
}

add(1, 2) //-> 3

However, this is a bit messy not only in readability, but maintainability; if the type of a changes, but b and/or the return type should not, then all three types will have to be updated.

There exists the other way of doing it, with first-class comptime type variables:

fn add(comptime T: type, a: T, b: T) T {
    return a + b;
}

add(i32, 1, 2) //-> 3

But this requires explicit type specification at the call-sites, which also includes a bit of maintenance overhead.

What would be helpful is some syntactic sugar: Type identifiers prefixed with an immediate $ (e.g. $FooType but not $ FooType) are type variables that can be used in place of types anywhere within the function (including the signature) and the compiler will automatically deduce them.

It may even be syntactic sugar for normal type variables (in the second form above) if that's easier to implement (it'd also be easier to reason about, too).

For example, the above signature becomes:

fn add(a: $T, b: $T) $T {
    return a + b;
}

add(1, 2) //-> 3

This last form should be nearly semantically similar to the first form (depending on how the type checking would be implemented) and identical to the second form (if this proposal were to be implemented as syntactic sugar for the type variables).

squeek502 commented 4 years ago

if the type of a changes, but b and/or the return type should not, then all three types will have to be updated

It's unclear to me how the proposal affects this. Wouldn't $T have the same problem?

Tetralux commented 4 years ago
fn add(a: $T, b: $T) $T {
   return a + b;
}

add(1, 2) //-> 3

This is similar to what Odin does; there, you only specify the $ on the first usage of the new type name to indicate where it's being declared:

add :: proc(a: $T, b: T) -> T {
    return a + b;
}

To this end, Zig's could be:

fn add(a: $T, b: T) T {
    return a + b;
}

As a sidenote, Odin also has extra specialization syntax for this where you can do stuff like this:

sum :: proc(elems: $T/[]$E) -> E { // T=[]E, E=elem type
    tot := E(0);
    for e in elems {
        tot += e;
    }
    return tot;
}

.. meaning that sum can then be passed a slice of any numeric type.

Tetralux commented 4 years ago

You can also declare mutliple "polymorphic" types in the signature if you'd like:

add :: proc(a: $T, b: $U) -> T {
   return a + T(b);
}
mogud commented 4 years ago

Vow, I love this proposal. It resolves #4820, and gives an elegant way to support generic params.

BarabasGitHub commented 4 years ago

It's not that I don't like this proposal, but I fail to see how it solves your example problem.

add :: proc(a: $T, b: T) -> T {
    return a + b;
}

Isn't this exactly the same, except for giving it a name, as

fn add(a: var, b: @TypeOf(a)) @TypeOf(a) {
    return a + b;
}

?? Or maybe that was your point?

Another question I have is about this example:

fn add(a: $T, b: $T) $T {
    return a + b;
}

What do you expect to happen if a changes type? A compile error?

I agree the explicit type parameters can get messy, but it also makes it a bit easier to reason about what types things are. I'm fearful of moving in the same direction as C++ templates, where you get weird errors about types which you have no idea where they come from or what they actually are.

schroffl commented 4 years ago

@Qix- You used this example, but what is T in this case?

fn add(a: $T, b: $T) $T {
    return a + b;
}

// comptime_int / u8 / ... ?
add(1, 2) //-> 3

Especially when the return type does not depend on the parameters, you will have to specify it explicitly again:

_ = add(@as(f32, 1), 2);

That being said, I actually like the current way of doing

fn myfunction(comptime T: type, a: T, b: T) T {}

it makes it very obvious what is going on when calling a "generic" function.

ghost commented 4 years ago

That being said, I actually like the current way of doing

fn myfunction(comptime T: type, a: T, b: T) T {}

it makes it very obvious what is going on when calling a "generic" function.

This works great when one type parameter is "shared". It doesn't work so well in contexts like this though:

// exaggerated example for the sake of demonstration:
fn myFunction(comptime T1: type, comptime T2: type, comptime T3: type, comptime T4: type, a: T1, b: T2, c: T3) T4 {}

// less messy, but less information
fn myFunction(a: var, b: var, c : var) DeduceReturnType(a,b,c) {}

// or if it was possible to "attach" type predicates to the var keyword
// type predicate: fn(type) bool
fn myFunction(a : var(isType1), b: var(isType2), c: var(isType3)) DeduceReturnType(a,b,c) {}
BarabasGitHub commented 4 years ago

At some point you also have to wonder what you're doing that you need so many generic type parameters which are all (potentially) different.

I've been wondering, what are valid use cases for var? And why pick var over an explicit type? One example I can think of is when using 'tuples'/structs with unnamed fields, such as for string formatting. But are there any other cases than tuples where var is obviously better or the only possible choice?

My question is: Should var just be used for tuples and actually be renamed to anytuple? And then just use explicit types for all the other cases?

schroffl commented 4 years ago

@user00e00 Good point. I never had a function with more than one type parameter, though. So I can't really comment on that.

Qix- commented 4 years ago

@BarabasGitHub Because explicit types require knowing the type up-front, which means either duplicating types for explicit calls, or using @TypeOf repeatedly - namely when chaining generic functions and mixing var and comptime T: type.

fn add(comptime T: type, l: T, r: T) T {
    return l + r;
}

fn mul(comptime T: type, l: T, r: T) T {
    return l * r;
}

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

This is, of course, an extreme example - it's meant more to demonstrate how complicated things can get, especially when mixing the two paradigms together.

However, with type variables, none of this is an issue - it's reduced to what is probably the minimum amount possible (aside from having type-less parameters entirely) while still enforcing proper types at the same time.

// Proposed syntax
fn add(l: $T, r: $T) $T {
    return l + r;
}

fn mul(l: $T, r: $T) $T {
    return l * r;
}

fn muladd(l: $T, r: $T, a: $T) $T {
    return add(mul(l, r), a);
}
AssortedFantasy commented 4 years ago

I personally dislike the proposal. I myself prefer highly verbose types but even despite my own bias I think that there are some pretty big issues here.

In particular take the example you gave here:

fn add(comptime T: type, l: T, r: T) T {
    return l + r;
}

fn mul(comptime T: type, l: T, r: T) T {
    return l * r;
}

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

I'd imagine for most programmers, even without knowing much about Zig or anything about the compiler, you can fully reason about how the return types got deduced and therefore how implicit casting rules would behave.

For muladd if r isn't an l or can't be safely cast implicitly to an l, this a compile error. The same is true for a, also the return type is always the same as that of l.

// Proposed syntax
fn add(l: $T, r: $T) $T {
    return l + r;
}

fn mul(l: $T, r: $T) $T {
    return l * r;
}

fn muladd(l: $T, r: $T, a: $T) $T {
    return add(mul(l, r), a);
}

Here its not immediately obvious how the compiler would deduce what $T is. Clearly something must happen but who knows? Does this just enforce that all of l, r, and a are the same exact type? But why, Zig allows implicit type coercion on function calls. So why do generic functions get to somehow be special? Does this mean that $T should be @TypeOf(l)? What if I would rather this all coerce to @TypeOf(r)?

Now its not actually possible to do so, as the order of the arguments matter, and if you tried doing @TypeOf(r) for the type of l and left r as var then you'd get an undeclared identifier error. Knowing this you could deduce that $T must be @TypeOf(l).

That's the beauty of the verbose syntax though, you don't need to understand much at all to know why calling add(f64, f32) returns a f64 and why add(f32, f64) is a compile error.

Maybe I'm just being salty here, but this whole $T syntax reminds me a lot of Java's generics where I've seen some pretty frustrating compile errors appear because of really complicated deduction rules in some generic function calls.

Tetralux commented 4 years ago

@AssortedFantasy

Here its not immediately obvious how the compiler would deduce what $T is. Clearly something must happen but who knows?

The example uses $T everywhere; my example, like Odin has, is more reasonable.

In which case, only one of those Ts actually has the $ and whichever one it is determines what the type of T is.

Does this mean that $T should be @TypeOf(l)? What if I would rather this all coerce to @TypeOf(r)?

You could just make the signature muladd(l: T, r: $T, a: T) T instead. Now it's r that determines the type of them all. In Odin, this doesn't work because you cannot refer to a type variable which is declared later on in the signature, but maybe Zig could allow this to work.

Qix- commented 4 years ago

I'm not a huge fan of the single $T and references of T aspect - the intent shown by using the variables is that any use of the $T variable refers to the type variable itself, and there's no other way to refer to it.

Further, there wouldn't be a single variable that determined the type; instead, Zig would ensure that all declarations with the type variable had equal types.

I feel pretty strongly that a single $T followed by references to T would be confusing syntax and thus less helpful than the status quo. I'd personally prefer to enforce all references of a type variable include the $ character.

Tetralux commented 4 years ago

Zig would ensure that all declarations with the type variable had equal types.

That's what $T followed by T does though?

AFAICT, it's the same as now with comptime T: type, except that you don't have to pass the type as an extra argument.

Qix- commented 4 years ago

@Tetralux Yes, it can be done that way, but I feel pretty strongly that it creates more confusion than it solves by allowing $ in some places and not in others. All usages of the type variable should have $ in order to remain consistent.

BarabasGitHub commented 4 years ago

namely when chaining generic functions and mixing var and comptime T: type.

Yes, I'm arguing that var is the problem here. It's better to be explicit, especially when things get complicated, because if you make a mistake somewhere and everything is a generic 'whatever' then it can be very hard to figure out what's going on. For example this

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

I agree this is horrible and I'd argue that you should do it like so:

fn muladd(comptime T: type, l: T, r: T, a: T) T {
    // Extreme example, of course.
    return add(T, mul(T, l, r), a);
}

Not so bad, is it? It's also clear you're calling generic functions, because you're passing a type. Instead if you do everything as with var it becomes like C++ templates, but worse because you have no named types. Or with your suggestion it is pretty much like C++ templates. (And for those who don't know, C++ templates are hell to debug and why you sometimes get 20 pages of text for one compile error.)

And then you have questions like what happens if I call: add(@as(i10, 1), @as(i5, 2)) or add(@as(i5, 1) @as(u10, 2))? With var the first will work, the second won't. With your suggestion I honestly don't know, I hope for a compile error due to ambiguity. With explicit types it's obvious. I can do even do add(u15, @as(i5, 1), @as(i10, 2)).

Tetralux commented 4 years ago

@BarabasGitHub

Or with your suggestion it is pretty much like C++ templates.

(And for those who don't know, C++ templates are hell to debug and why you sometimes get 20 pages of text for one compile error.)

I find it interesting that you feel this way, considering that Odin has very similar approach to what's being suggested here, yet has significantly better error messages, and I've never had problems figuring out what was wrong from them, so I suspect your objection is unfounded. C++ is a different story though - those errors are complete garbage.

BarabasGitHub commented 4 years ago

@Tetralux I have no experience with Odin, so it's possible that it can be made to work. Personally I prefer being explicit and keeping things simple. Sometimes it can be a bit more work to type, but it is often easier to understand when reading.

One function in Zig std where I think an explicit parameter might have been better is testing.expectEqual.

Half the time you end up writing this:

    testing.expectEqual(@as(usize, 0), container.size());

while you might as well do

    testing.expectEqual(usize, 0, container.size());

Of course that also makes tests like these longer:

    testing.expectEqual(test_values.len, container.span("a").len);

which now needs to be:

    testing.expectEqual(usize, test_values.len, container.span("a").len);

but I don't feel that's all that bad.

thejoshwolfe commented 3 years ago

There are 3 different APIs you can make for functions that take multiple parameters of the same generic type, and they each have different semantics. Here they are using existing Zig features:

// Explicit type parameter
fn add(comptime T: type, a: T, b: T) T {
    return a + b;
}

// First parameter determines the type
fn add(a: anytype, b: @TypeOf(a)) @TypeOf(a) {
    return a + b;
}

// Peer type resolution
inline fn add(a: anytype, b: anytype) @TypeOf(a, b) {
    // This intermediate function only exists to facilitate peer type resolution between a and b,
    // then implicitly coerce each value to the type.
    return add_impl(@TypeOf(a, b), a, b);
}
fn add_impl(comptime T: type, a: T, b: T) T {
    return a + b;
}

There were a few ideas in the original post of this discussion and in follow up comments that would give syntactic sugar for the non-explicit versions above:

// First parameter determines the type
fn add(a: $T, b: T) T {
    return a + b;
}

// Peer type resolution
fn add(a: $T, b: $T) $T {
    return a + b;
}

In order for syntactic sugar to be added to the language, it needs to encourage programmers to more clearly convey intent, write more correct code (avoid bugs), or generally align with paradigms idiomatic to Zig. In addition, the use case needs to be common enough to justify the increased complexity of the Zig language.

While this proposal does encourage more clearly conveying intent and quite possibly avoiding bugs, there is one major problem. This proposal is just the tip of the iceberg of pattern matching features that should probably follow.

Here are more pattern matching features I would expect if I can declare a parameter of type $T:

  1. declare a parameter []$T
  2. declare a parameter *const [$N]$T
  3. declare a parameter ?$T
  4. declare a parameter either []$T or []const $T determined at the callsite.
  5. declare a local variable of type $T, and then declare other variables of the same type.
  6. declare a parameter of any type that supports index subscripting and .len property access (slices, pointers to arrays, etc.).
  7. declare a parameter of any type that supports the + and * operators.
  8. declare a parameter of any type that has a fn next($T) ?$V method.
  9. and more

We can see examples of metaprogramming languages that have attempted to accomplish these use cases such as C++'s templates, Rust's type bounds, even Java's bounded type parameters. Each metaprogramming language has a different set of features (see Rust's where clauses or C++'s variadic templates), and in each case the metaprogramming language is a different language from the programming language it's a part of, often employing <angle brackets> where they are otherwise not used.

If we were to add a robust pattern matching metaprogramming system to Zig, it would be a huge amount of additional complexity to the point where I would consider it an additional language within Zig. But everything that that additional language would do is already possible to write today using regular imperative Zig code that runs at compile time. You can even import third-party metaprogramming libraries that implement arbitrarily sophisticated pattern matching.

Every one of the numbered use cases above can be implemented by a comptime function that takes a type and emits a compile error if it doesn't conform to some constraints.

The only drawbacks to using a userspace solution for pattern matching are that the compile errors are a little more confusing, and that the function signature does not encode any rules using the Zig type system; you have to rely on the function's documentation to know how to call it.

There's a bit of a fork in the road for the future of Zig. We can either attempt to implement a sophisticated metaprogramming language like most other modern languages have done, or we can just have anytype and offload all the metaprogramming responsibilities to userspace. Given that the latter is always going to be possible with comptime code, there is a significant barrier to accepting any syntactic sugar for pattern matching.