Closed dhil closed 5 years ago
Weirdly, if I do this one line at a time in the REPL, the right thing happens - i.e. you get an error
links> typename Foo = Int;
Foo = Int
links> typename Foo = Bool;
Foo = Bool
links> sig f : (Foo) {}~> Int
...... fun f(foo) { foo };
<stdin>:2: Type error: The non-recursive function definition has return type
`Foo'
but its annotation has return type
`Int'
In expression: fun f(foo) { foo }.
Weirdly, if I do this one line at a time in the REPL, the right thing happens - i.e. you get an error
Using the REPL the refineBindings
transformation only operates on singleton lists (you can only type one binding at a time) , so, it can not reorder the bindings (that is what is happening now, because it conflates duplicate declarations).
I have a patch now that makes duplicate type names in same scope an error.
For example, for the following program
typename Foo = Int;
typename Foo = Float;
sig f : (Foo) {}~> Int
fun f(foo) { foo }
typename Foo = Bool;
Links now produces the following error
Syntax error: The type name Foo is declared multiple times.
First declaration at file test.links:8:
- typename Foo = Int
Subsequent declaration at file test.links:10:
- typename Foo = Float
Subsequent declaration at file test.links:17:
- typename Foo = Bool
Great, that sounds like a more sensible solution. Is this part of your larger nominal typename patch?
Great, that sounds like a more sensible solution. Is this part of your larger nominal typename patch?
I intend to separate them out. There is a little bit more work to be done on refineBindings
. I intend to upstream that work first.
For value bindings I have adopted a similar rule, for example, the following program:
fun a() { println("a1") }
fun b() { a(); c() }
fun a() { println("a2") }
fun c() { a() }
fails to compile with the following error:
Syntax error: The function a is declared multiple times within the same binding group.
First declaration at file test.links:19:
- fun a() { println("a1") }
Subsequent declaration at file test.links:21:
- fun a() { println("a2") }
That is to ensure there is no ambiguity within binding groups (informally: a binding group is a sequence of syntactic similar bindings).
Note that the rule for recursive functions only applies on a binding group basis, so the following is a valid (and unambiguous):
fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
fun a() { println("a2") }
This program has three binding groups: [a, b, c]
, [x]
, and [a]
. The two a
s are distinguishable because they belong to different (syntactic) binding groups.
Is there a particular reason to allow this? If
fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
fun a() { println("a2") }
appears inside a module (that exports a
) then it seems ambiguous which a
is exported (I guess the last one wins). The added flexibility of being able to do this doesn't seem worth it relative to the potential for confusion.
The intended behaviour is to support shadowing. The behaviour of the bug you're seeing is unsurprising to me. I suspect that the list of type bindings is being reversed somewhere. Glancing at the code in refineBindings
it looks like this is probably what's going on. When one uses fold_right
to traverse and reconstruct a list the resulting list is reversed. This is accounted for (with a comment!) in the refineBindings
function, but not subsequently. I strongly suspect this is a regression introduced by the desugaring of mutually recursive types.
Take home message: always think hard about whether it's necessary to reverse a list generated by List.fold_right
and if in doubt you probably should.
+1 for shadowing
Regarding the reversal: something's odd, since fold_right
preserves the order (i.e. fold_right cons - []
is the identity).
Yes, the point of reversing after fold is to use List.fold_left
which is tail recursive. If you're not dealing with super long lists then use List.fold_right
and don't reverse.
@yallop oops yes. Brain fart. Everything I said would have made sense if I'd been talking about List.fold_left
.
I suspect that the list of type bindings is being reversed somewhere.
For the record, there's one production in the parser that produces a list in reversed order:
handle_params:
| rev(separated_nonempty_list(COMMA,
separated_pair(logical_expression, RARROW, pattern))) { $1 }
@jamescheney Yes, there is a reason to allow the above code. If we disallowed it then the semantics of the REPL would unnecessarily diverge from that of whole programs. For instance, it is nice that we can write
fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
followed by
fun a() { println("a2") }
in the REPL, and that it behaves the same as:
fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
fun a() { println("a2") }
OK. I think in (Oca)ML there is the same behavior but the fact that you have to explicitly write "and" in between mutually recursive functions makes it clearer what is going on and easier to tell if you are rearranging code in a binding-preserving way With the Links design, if I comment out the "var x = 42" line, or inadvertently move the last "fun a" line above the "var x" line without noticing the earlier "var a", I (currently don't but should) get an error. In general, if we want shadowing / order-sensitive binding, then making the structure of mutually recursive function groups explicit in the syntax seems desirable, otherwise the Haskell-style rule where the order of declarations does not matter and duplicating names in the same scope is not allowed should be followed, and we should have a workaround to allow rebinding in the REPL, like Haskell also does.
Also. in the REPL, you can't actually combine a function group and a "var" declaration group in a single "line" anyway:
links> fun a() { println("a1") }
...... fun b() { a(); c() }
...... fun c() { a() }
...... var x = 42;
*** Parse error: <stdin>:4
var x = 42;
So I'm not entirely persuaded that this is an important use case since we already can't do it.
If we disallowed it then the semantics of the REPL would unnecessarily diverge from that of whole programs.
From a perspective of some who writes a compiler this is indeed a complication. But from a perspective of a programmer that actually uses the language I have always found this kind of shadowing to be a major nuisance. So I concur with @jamescheney that:
making the structure of mutually recursive function groups explicit in the syntax seems desirable, otherwise the Haskell-style rule where the order of declarations does not matter and duplicating names in the same scope is not allowed should be followed, and we should have a workaround to allow rebinding in the REPL, like Haskell also does.
If we want to support proper shadowing, should we then make mutuality amongst bindings explicit in the syntax? We could adopt the ML convention, and introduce and
as a keyword:
typename Tree(a)= [|Empty|Node:(a, Forest(a))|];
and Forest(a) = [|Nil|Cons:(Tree(a), Forest(a))|];
fun even(n) {
if (n == 0) true
else odd(n-1)
} and odd(n) {
if (n == 0) false
else even(n-1)
}
Then there is question about how we deal syntactically with mutual recursive functions that have an explicit signature, we could possibly stack them at the beginning of the mutual bindings:
sig odd : (Int) ~> Bool
sig even : (Int) ~> Bool
fun even(n) {
if (n == 0) true
else odd(n-1)
} and odd(n) {
if (n == 0) false
else even(n-1)
}
Note, currently a signature must immediately precede its implementation.
Note, currently a signature must immediately precede its implementation.
That sounds like a restriction that could possibly be loosened. It would also be nice to be able to group together signatures for several bindings of the same type, a'la Haskell:
eqString_RDR, unpackCString_RDR, unpackCStringFoldr_RDR,
unpackCStringUtf8_RDR :: RdrName
eqString_RDR = nameRdrName eqStringName
unpackCString_RDR = nameRdrName unpackCStringName
unpackCStringFoldr_RDR = nameRdrName unpackCStringFoldrName
unpackCStringUtf8_RDR = nameRdrName unpackCStringUtf8Name
That sounds like a restriction that could possibly be loosened.
Yep, I believe that would require a dedicated node for signatures in the AST. As of writing, signatures are attached to their implementations by the parser. Due to our current representation of names in the frontend and our (intended) shadowing semantics, it is not sufficient to build up a simple map from names to signatures during parsing.
This issue has grown arms and legs. I've started a separate issue #457 for discussion of whether/how to alter the syntax and handling of mutually recursive functions, which would be an enhancement rather than bugfix (i.e. there is no urgency to resolve #457 as part of this issue).
(just to be clear, I know it's my fault that the issue grew arms and legs, I should probably have started the separate issue for the letrec discussion earlier).
Shadowing of type names is, to put it mildly, confusing.
Somewhat surprisingly, the following program type checks:
Reading the program top-down, I would expect to see a type error. I expect the type of
f
to be structurally equivalent to(Bool) {}~> Int
, it so happens that it isn't. It is equivalent to(Int) {}~> Int
, which is telling that Links does not respect shadowing of type names. Either we should prohibit duplicate declarations or support shadowing in the same scope.The reason why the first
Foo
is picked over the secondFoo
) is due torefineBindings.ml
.