Closed andrewrk closed 5 years ago
I'm mentioning it for the sake of completeness, not as a serious suggestion, but the ambiguity can be resolved with arbitrary look-ahead / back-tracking.
For tooling reasons though you probably want zig's grammar to be LL(1) or LR(1).
As an aside, I do like having the ->
before the return type, just for consistency with similarly recent languages (Swift, Rust) that employ this syntax. It makes reading code a bit easier; I find.
That said, for this issue it seems the problem is figuring out where the type declaration ends and the bod begins. The main flaw is that {
is a valid token inside the return type declaration.
Possible solutions I see:
1) Introduce a token to designate the end of return type and start of body. For example, =
, as in:
const abc = fn(x: X, y: Y) -> Z = {
// function body
}
I don't think this is a very good idea. It deviates too much from what everyone is used to in similar languages, and adds a needless clutter to a lot of code just to avoid an ambiguity that only arises some of the time.
2) Forbid {
inside the return type, unless surrounded by parenthesis. For example:
fn(...) -> (A{}) { the return type is `A{}`
}
fn(...) -> Z { // the return type is just `Z`
}
This avoids the ambiguity without requiring the `(..)` to be always present in the return type, but potentially violates the "only one obvious way of doing things", and can cause "code style" wars, with one camp saying "you should always use parenthesis around your return types" and another one saying "never put parenthesis around return types, and never put `{` inside the return types; always use type aliases".
Which brings me to the next idea:
3) Forbid {
inside the return type completely; require the use of type aliases.
If you need a function return a type that requires `{` in its definition, the only way to return such a type is to give an alias first. This can be good for readability, and avoids the introduction of having multiple ways of doing the same thing.
I don't have other ideas at the moment.
@hasenj Your first suggestion is pretty ok if we can declutter it a bit. With number 3, readability can be hindered by forcing everything to have a name/alias. Number 2 seems like a good compromise.
As an iteration on the first, using =>
to denote the start of the function body:
const S = struct {
someFn: fn(x: X, y: Y) error{} ! Z,
foo: u8
}
fn bar(x: X, y: Y) error{} ! Z => {
// function body
}
// Single expression functions!
fn barWhenXEquals2(y: Y) !Z => try bar(2, y);
const s = S {.someFn = bar, .foo = 0 };
@raulgrell taking your =>
suggestion to its natural conclusion:
if x + y > 10 => {
} else {
}
if foo() => |value| {
} else |err| {
}
while it : |node| it = node.next => {
}
while x < 10 : x += 1 => {
}
switch expressions already match this pattern.
I'm not so sure about this. I'd like to explore changing container initialization syntax and container declaration syntax to disambiguate.
To clarify:
()
on if
, while
, switch
, for
, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:
()
on return types, just like if
and friends()
on if
and friendsI'd love to rid if
and friends of their parentheses, so my vote is strongly for the =>
.
The curly brackets for container initialization is common in C, C++, Rust, C#. On the other hand, of those, only Rust allows the omission of the parentheses in if statements.
Alternative container initialization methods include:
const S = struct {a: u8, b: u8};
// Some of these would probably mean adding named parameters in function calls
// keyword - it's a familiar keyword, but doesn't really look right in the language
const s = new S(.a = 0, .b = 1);
// builtin type function - more consistent with other language features. other function names
// can include make, create, init, initialize
const s = S.new(.a = 0, .b = 1);
// Callable type with named parameters - builtin types are callable, so there's some
// consistency here.
const s = S(.a = 0, .b = 1);
// We could probably do with compile time enforcement of Capitalized struct names if
// calling with parentheses or square brackets. On the other hand, it would mean
// comptime created types could look awkward
fn V(comptime T: type) type {
return struct { value: T }
}
const vs = V(.T = S)(.a = 0, .b = 1);
// We could do square brackets on type is initialization
const s = S[.a = 0, .b = 1];
const vs = V(.T = S)[.a = 0, .b = 1];
// named initializers - I'm not sure how we'd define an initializer outside of the struct's scope.
// I do have a slightly more complicated idea that involves enum arrays, but I'll
// get to that later
const T = struct {
a: u8,
b: u8,
new zeroes => { .a = 0, .b = 0 }
}
const t_zeroes = T::zeroes;
// builtin function
const t_ones = @new(T) { .a = 1, .b = 1};
// with named args
const t_ones = @new(T, .a = 1, .b = 1};
This is the cleanest iteration of the above ideas I have found, using the new keyword and changing declaration syntax.
Declaration syntax changes to use parentheses instead of curly brackets, and square brackets instead of parentheses. Initialization requires new (T) { .fields = t }
. Somewhat inspired by #661
const S = struct(a: u8, b: u8);
const s = new(S) {.a = 0, .b = 0};
fn G(comptime T: type) type { return struct( a: T ); }
const g = new (G(S)) { .a = s };
const E = enum(A, B);
const TE = enum[u8] (A = 3, B = 4);
error NoMem;
const errorSet = error(NoMem);
const C = union[E] ( A: u8, B: void);
const c = new (C) { .A = 0 };
const U = union[enum] (Number: f64, String: []const u8, None)
const n = new (U) { .Number = 0.99};
// This a function that returns A with an empty body
fn foo() A { }
// This is a function that returns the struct literal and an empty body
fn foo() struct(a: u8) { }
// This is a function that returns the struct literal A() and an empty body
fn foo() A() { }
// This is a function that returns an error with an empty body
fn foo() error {}
// This is a function that returns empty error set and an empty body
fn foo() error() {}
// This is fine
fn foo() error()!void {}
Actually, with these changes to declaration, I think initialization syntax can stay the same. I'm just not sure about the "struct literal A()" bit.
const s = S {.a = 0, .b = 0};
const g = G(S) { .a = s };
const c = C { .A = 0 };
const n = U { .Number = 0.99};
Here's my proposal:
.
before the {
when instantiating types and values()
for if
, while
, for
, switch
, etc.const Foo = struct.{
a: i32,
b: i32,
};
const Id = enum.{
One,
Two,
};
const Blah = union(enum).{
Derp,
Bar: struct.{
aoeu: f64,
},
};
const FileError = error.{
OutOfMemory,
AccessDenied,
};
test "aoeu" {
var array = []u8.{'a', 'b', 'c', 'd'};
var x = Foo.{
.a = 1234,
.b = 5678,
};
if x.a == 1234 {
while true {
eatStinkyCheese();
}
}
}
This fixes all the parsing ambiguity. Now {
proceeded by a .
is always type/value instantiation, and {
not proceeded by a .
is always the beginning of a block.
It looks a little awkward but I guess that's no reason not to do it. That said, I didn't see discussion on reintroducing ->
. Are there drawbacks besides extra typing?
Are there drawbacks besides extra typing?
Not really. I removed ->
because with mandatory return types, it was rendered unnecessary.
->
does not solve the return type syntax problem though. Whether or not we have ->
, there is still the fn foo() -> error { } { }
ambiguity.
Ah right. And mandating parens around the return type makes for bad error messages if you forget them. I see the issue now.
I was wondering whether it would be beneficial to change the built in field/reflection syntax from using the regular scope accessor .
to a new operator like ::
, ie:
Function.return_type
becomes Function::return_type
and slice.len
becomes slice::len
. This signals to the reader that whatever you're accessing is something built-in.
Then keep your proposal and make container instantiation requiring a ::
followed by a {
const Blah = union(enum)::{
Derp,
Bar: struct::{
aoeu: f64,
},
};
Ok with the proposal, but:
var x = Foo.{
.a = 1234,
.b = 5678,
};
Do we still need dot a, dot b here? We already have a dot after Foo.
Only because this was mentioned in a separate issue, I just wanted to voice my support for your most recent proposal - disregard my ::
suggestions as I've come to appreciate the value of the universal .
operator.
Regarding @jido's point about the dot (ha, pun), I think they are consistent with the rest of the language.
// Here there is no dot because the identifier is being declared
const Foo = struct.{
a: i32,
b: i32,
};
// Here, we would have a dot because the name already exists
var x = Foo.{
.a = 1234,
.b = 5678;
};
If we look at consider the enum type inference in switches proposed in #683, they mention of issues with shadowing without the .
. This would also be consistent with other parts of the language
const Value = enum(u32).{
Hundred = 100,
Thousand = 1000,
Million = 1000000,
};
const v = Value.Hundred;
switch (v) {
.Hundred => {},
.Thousand => {},
.Million => {}
}
Thanks for doing that work, @Hejsil .
To those who prefer the previous syntax, I empathize because I also prefer the previous syntax. However parsing ambiguities are simply not acceptable. And so we're stuck with the least worst solution. Proposals for different syntax/solutions are welcome, provided that they are better than status quo and do not introduce any syntax ambiguities.
@andrewrk as I mention in my previous comment in #1628, using the dot here harm readability, as it's used usually as a member access operators, here it's overloaded with two extra meaning define
and new
.
If I'm getting this right, the problem is to solve the ambiguity of expressions like this
fn foo() A {}
fn foo() error{}!void {}
All what we need is to surround the return type with symbols that are not used somewhere else, how about
fn foo() `A {}`
fn foo() `error{}!void` {}
To clarify:
()
onif
,while
,switch
,for
, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:
- require
()
on return types, just likeif
and friends- solve the syntactic ambiguity a different way, and drop the
()
onif
and friends
I much prefer to require ()
on return types and keep it with if
, while
, switch
, for
, etc, like it is with C
like languages than create a new meaning for .
@allochi I'm going to amend your proposal and change it from surrounding the return type with backticks to surrounding it with parentheses. I think that's actually a reasonable proposal. If we end up doing named return values to solve #287, then parentheses around return values is especially reasonable.
How about in case of ambiguity use a dot?
fn foo() A {} // function returns A fn foo() A{} {} // compile error fn foo() A.{} {} // function returns A{} struct literal
Now everything stays the same for the most part, assuming this works
Here's how I'm thinking it:
{} is always the body of what's to the left of it.
const A = struct {} // works fine fn foo() {} // easily detects that return type is missing
The function decl parsing code is special cased to expect the {} to always refer to its body, and it must be escaped if you want to refer to something else.
fn foo() A.{} {}
If you are not using the dot I would use inset operators e.g.
const FileError = error in {
OutOfMemory,
AccessDenied,
};
test "aoeu" {
var array = []u8 of {'a', 'b', 'c', 'd'};
var x = Foo with {
.a = 1234,
.b = 5678,
};
if x.a == 1234 {
while true {
eatStinkyCheese();
}
}
}
How about in case of ambiguity use a dot?
I don't think this is compatible with the best way of solving the problem, which is that the parser is independent of the type system. Otherwise known as Context Free Grammar.
If you are not using the dot I would use inset operators e.g.
I don't think in
as a keyword is better than .
.
I don't think this is compatible with the best way of solving the problem, which is that the parser is independent of the type system. Otherwise known as Context Free Grammar.
All that's changed (I think) is types in function signatures have a different grammar, .{} instead of {}. I don't know language theory and I haven't thought this through either, but the change sounds benign.
Edit: I'm going to embarrass myself, but a mini language:
Decls -> Decl Decls | Decl Decl -> Func | Var Func -> fn Ident () Type2 FuncBody FuncBody -> { Variable } Variable -> var Ident : Type; Type -> Ident {} Type2 -> Ident . {}
@allochi I'm going to amend your proposal and change it from surrounding the return type with backticks to surrounding it with parentheses. I think that's actually a reasonable proposal. If we end up doing named return values to solve #287, then parentheses around return values is especially reasonable.
@andrewrk which is better than what I have proposed, I suggested backticks cause I thought the parentheses are out of question to make parsing easier.
I use parentheses with Go, and it doesn't bother me, although in Go it's only required if you have multiple return values (named or not), with parentheses we will have the following, which is quite reasonable
fn foo() (A{})
fn foo() (error{}!void) {}
pub fn main() (!void) {
var i: i32 = 42;
}
We can also make zig artistic by introducing
fn foo() ._=|A{}|=_.
fn foo() ._=|error{}!void|=_. {}
Just joking, or maybe not, humm... ๐
Here is another way we could go. What causes ambiguity is initializers and error sets. struct/union/enum type decls are not ambiguous, as it is required that they a followed by a lbrace.
fn a() struct {} {}
fn b() union {} {}
fn c() enum {} {}
Error sets are ambiguous because the error
type is a thing.
// is this fn()error or fn()error{} ?
fn d() error {}
We could rename error
to anyerror
or something, and now, error sets are also not ambiguous.
Now, we just need to solve initializers. @andrewrk and I discussed removing our current initializers, in favor of inferred initializers and implicit cast.
Point(.{.x = 1, .y = 2})
@Hejsil nice!
But the language allows for this anonymous struct definition in a return type of a function
fn something() ?struct{ x:i32 } {
return null;
}
I don't know the point and use cases of this, I'm not aware of all Zig rules yet, but this compilers, and it shouldn't IMHO.
@andrewrk and I discussed removing our current initializers, in favor of inferred initializers and implicit cast.
Point(.{.x = 1, .y = 2})
I hope you mean this
Point{.x = 1, .y = 2}
back to the original form, (.{})
is slightly clutter.
@allochi Your first example is unambiguous. The first pair of braces always has to be part of the struct decl. There is no other way to parse this code.
We can't have Point{.x = 1, .y = 2}
, with my proposal, because it is ambiguous.
// Is this a fn()Point or is it a fn()Point{} ?
fn a() Point {} {}
I feel I'm missing couple of points here:
are we talking about initializing when we talk about Point(.{.x = 1, .y = 2})
? like we can't have this back?
const point = Point{.x = 1, .y = 2};
we will have this instead?
const point = Point(.{.x = 1, .y = 2});
are return types only ambiguous when they have empty brackets? like this
fn foo() A{} {
...
}
but not like this?
fn foo() A{ x: i32 } {
...
}
What I mean:
a. is struct definition not a problem when it's not empty?
b. but it's a problem when it's empty, because we don't know if it's an empty struct A{}
or an initialization A{}
?
But then in the context of a function definition at return types, why would we have an initializer? and why would we have empty structs/union/enum as return types? what would be the use of them?
Why allow this in the compiler?
fn something() ?struct{} {
return null;
}
Yes, that's what we're talking about. We can't have Point{.x = 1, .y = 2}
currently because it is ambiguous during parsing.
Well, in theory, yes, return types are only ambiguous with empty brackets, but it is a little more complicated than this. We're trying to keep the grammar of the language simple, so we keep the lookahead requirements as small as possible (currently, we need 2 token lookahead to parse Zig).
a. No, struct/union/enum is never a problem. The reason for this is that they are required to have braces. You can't have the struct/union/enum keywords without them being followed by braces, as that is the invalid syntax. Therefore, we always know where the braces belong:
// I'll add parens to show why this is not ambiguous
fn a() struct {} // error: expected function body
// fn a() (struct {})
fn a() struct {} {}
// fn a() (struct {}) {}
Now, let's look at initializers:
fn a() A {}
// fn a() (A) {}
// fn a() (A {})
// ^ Ambiguous because an initializer is optional
fn a() A {} {}
// fn a() (A) {} {}
// fn a() (A {}) {}
// ^ Ambiguous because an initializer is optional
The reason this is allowed is because the gramma is something like this:
"fn" <Ident> "(" List("name" ":" <Expr>) ")" <Expr> <Body>
We need the gramma to be like this, because any expression could be @typeOf(expr) == type
, and the function require that the expression of the return type is of type type
. This allows Zig generics to work the way they do. A generic data structure in Zig is a struct returned from a function taking a type.
fn Point(comptime T: type) type {
return struct { x: T, y: T };
}
As for the last example. Any programming language allows alot of weird programs to compile. A programming language cannot be turning complete, if you cannot write an ugly hack in it (I have no proof for this, but I'm pretty sure this is true).
So can't we just come up with an initialization for empty struct only?
I'm assuming in the parsing phase (I really need to dig more into Zig grammer and compiler) if we find a dot after the braces {.
we conclude it's an initialization, if not it's a definition (am I sort of right?), so:
empty vs non-empty struct definition would look like
A{} // empty struct definition
B{ x: i32} // non-empty struct definition
while initialization looks like
A{.} // empty struct initialization
B{ .x = 42 } // non-empty struct initialization
This way I think we don't even need parentheses around return types (maybe), we already now know that A{}
is a definition and A{.}
is an initialization.
What do you think? and thanks for you patience ๐
@allochi No problem. We have these issues so we can discuss and come up with solutions for Zigs problems :)
I still think there is a problem. Consider if we had inferred initializers as they are currently proposed:
fn a() A { .{}; }
We would need more than the .
to solve this (or we could change the inferred initializer syntax)
What about moving the return type definition into the parens of the function declaration?
// status quo
fn sum(a: i32, b: i32) i32 { return a + b; }
// return type moved into the parens
fn sum(a: i32, b: i32 -> i32) { return a + b; }
// longer signature
fn quiteTheLongFunctionName(
the_first_argument: SomeElaborateType,
another_argument: AnotherType,
-> TheResultType) {
return /* ... */;
}
// Again with inline error sets
fn quiteTheLongFunctionName(
the_first_argument: SomeElaborateType,
another_argument: AnotherType,
-> error {
FirstFailure,
SecondFailure,
}!TheResultType) {
return /* ... */;
}
@Manuzor This indeed does solve the problem. Idk if this deviates too much from calling a function, but it's definitely not bad :thinking:
@Hejsil Thanks, if the dot is a problem in {.}
, maybe another symbole ({0}
or {..}
or {~}
) for empty initializer. This way we can remove the leading dot before braces for initializers.
@Manuzor PostgrSQL functions use similar style, I can't say it's nice to read
CREATE OR REPLACE FUNCTION fn_sqltestout(param_subject text,
OUT subject_scramble text, OUT subject_char text)
This could easily lead to have clutter between parentheses:
it could take some time to read and understand a function signature, and find out which is which.
I propose that we use a different symbol for initialization to remove confusion between defining the type and initializing.
# sign is unused
var p: Point = #{.x = 1, .y = 0}; // pound sign
Also the empty set seems like a corner case, maybe that should change
Adding to @allochi list of empty initializers {-} {!} {_} {=/=} {#}
@allochi I'm not sure how far syntax can help you reduce clutter in function signatures. We didn't even look at qualifiers such as extern "kernel32"
or the function calling convention here yet. They all add clutter.
Another alternative was mentioned before which enforces parens for the return type. It solves the ambiguity but it requires me to chase parens to figure out where the set of input parameters ends and the return types begin. I chose the arrow because it stands out.
In the case of multiple return values, my arrow-inside-parens version allows the syntax to extend naturally. You just continue specifying stuff separated by commas.
fn getMultipleResults(
a: FirstType,
b: SecondType,
-> error {
FirstError,
SecondError
}
! FirstResult,
SecondResult,
) {
// ...
}
Now that I'm looking at this, maybe we should consider swapping how return types and the error set are declared... I guess that's another proposal, though.
fn getMultipleResults(
a: FirstType,
b: SecondType,
-> FirstResult,
SecondResult,
! error {
FirstError,
SecondError
}) {
// ...
}
I have another idea. The only change from 0.3.0's syntax is to rename the error
type to anyerror
or whatever works best.
We split the expression syntax into two parts: TypeExpr
and Expr
. A TypeExpr
is an expression that can return the type type
.
[]<Attributes> <TypeExpr>
[*]<Attributes> <TypeExpr>
[<Expr>]<TypeExpr>
<Expr> "." <Symbol>
<Expr> "(" <Exprs> ")"
@<Symbol> "(" <Exprs> ")"
"(" <Expr> ")"
...
An Expr
is all other expressions + TypeExpr
.
<Expr> + <Expr>
...
<TypeExpr> "{" <Exprs> "}"
<TypeExpr> "{" <FieldInits> "}"
<TypeExpr>
...
We then change the function syntax to this:
"fn" <Symbol> "(" <Params> ")" <TypeExpr> "{" <Statements> "}"
And now, this is a compiler error:
fn a() A{} {} // error: expected top level decl, found "{"
// (fn a() (A) {}) {} // parens, to show how it was parsed
// Now, if a literal should be parsed for function return type, parens are required:
fn a() (A{}) {}
// Note, that <Expr> "." <Symbol> is a <TypeExpr>. This is therefor possible:
fn a() A{.T = u8}.T {}
And, we could make the syntax for try
this:
"try" <TypeExpr>
Which allows this:
try somefunc() + 1
// parsed as (try somefunc()) + 1
Maybe TypeExpr
is the wrong name. It's an expression who can return all types, including type
, where Expr
also includes expressions which is limited to returning only certain types.
<Expr> + <Expr>
always returns uN
or iN
.
<TypeExpr>{<FieldInit>}
never returns type
, errorset
, uN
, iN
ect.
Maybe AnyTypeExpr
is a better name.
We split the expression syntax into two parts:
TypeExpr
andExpr
. ATypeExpr
is an expression that can return the typetype
.An
Expr
is all other expressions +TypeExpr
.<Expr> + <Expr> ... <TypeExpr> "{" <Exprs> "}" <TypeExpr> "{" <FieldInits> "}" <TypeExpr> ...
We then change the function syntax to this:
"fn" <Symbol> "(" <Params> ")" <TypeExpr> "{" <Statements> "}"
And now, this is a compiler error:
fn a() A{} {} // error: expected top level decl, found "{" // (fn a() (A) {}) {} // parens, to show how it was parsed // Now, if a literal should be parsed for function return type, parens are required: fn a() (A{}) {} // Note, that <Expr> "." <Symbol> is a <TypeExpr>. This is therefor possible: fn a() A{.T = u8}.T {}
That looks quite sane.
I approve of https://github.com/ziglang/zig/issues/760#issuecomment-430938743. anyerror
is a great name for the primitive type that represents the global error set.
So it's "fn foo() anyerror {}"?
For shorter typing you could have "fn foo() error {}" and "const FileOpenError = errorset {};"
@UniqueID1 It is encouraged/good style to use error sets more than error
, so I think it makes sense to change error
.
If the standalone error
is the only case that causes ambiguity then I also think special casing it is a much better solution than anything else mentioned here before. The identifier anyerror
states the intent very clearly.
@Hejsil Wanting to have (subjectively) uglier syntax for encouragement reasons is a valid viewpoint, if I'm not simplifying your viewpoint too much. I think it's in a way risky. I think it'll encourage people to switch over from the global error set at the wrong time for the wrong reason.
When I say "at the wrong time" I'm channeling my inner Jonathan Blow who has a way of writing really imperfect code at the right time which works for him to stay hyper productive. So if you're not handling errors yet, and you might not for years, then you'll use the global error set to keep yourself from going off on a tangent such as having to define the errors correctly before you'll ever need to handle them.
@UniqueID1 I see your concern, but I'm pretty sure inferred error sets cover 99% of this use case (fast prototyping), because in 99% of cases you (or the compiler), knows which errors can be returned from a function.
If you don't care, then just add !
to the return type of all functions that can fail. It's easier to type, and more correct than anyerror
.
Ahh okay, also anyerror is very descriptive so that's a plus for it. It's actually not bad at all and it tells the newbies exactly what it does.
Alright, seems like we might have found a better solution than what was merged in #1628. Any objections should be voiced as soon as possible, as I wanna revert #1628, and start implementing the proposed solution.
One downside to this is that that we have to reject #1659.
// Note, that <Expr> "." <Symbol> is a <TypeExpr>. > This is therefor possible: fn a() A{.T = u8}.T {}
This doesn't seem right. Everything else about the proposed TypeExpr seems good, but I think this example should fail to parse.
The formal grammar specification for Zig isn't always up to date, but it might be able to shed light on this if it's updated with this proposal.
This is also not very important and shouldn't block any forward progress. This is just something to iron out before 1.0.0.
Fundamentally we have this ambiguity:
Is this a function that returns A with an empty body? Or is this a function that returns the struct literal
A{}
and we're about to parse the body?Another example
Function that returns an error, or empty error set, and we're about to find the function body?
The above is not allowed because zig thinks the return value is
error
and the function has an empty body, and then is surprised by the!
. So we have to do this:Unfortunate. And the compile errors for forgetting to put a return type on a function point at the wrong place.
Let's come up with a better syntax for return types. All of these issues are potentially on the table:
->
or=>
if it helps.void
return types.fn foo() !T {}
- is special syntax that belongs to a function definition that marks the function as having an inferred error set. It doesn't necessarily have to be specified as it currently is by leaving off the error set of the!
binary operator.The syntax change should be satisfying, let's make this the last time everybody has to update all their return type code.