links-lang / links

Links: Linking Theory to Practice for the Web
http://www.links-lang.org
Other
329 stars 42 forks source link

Typename shadowing #455

Closed dhil closed 5 years ago

dhil commented 5 years ago

Shadowing of type names is, to put it mildly, confusing.

Somewhat surprisingly, the following program type checks:

typename Foo = Int;
typename Foo = Bool;

sig f : (Foo) {}~> Int
fun f(foo) { foo }

links> f(2)
2 : Int

Reading the program top-down, I would expect to see a type error. I expect the type of f to be structurally equivalent to (Bool) {}~> Int, it so happens that it isn't. It is equivalent to (Int) {}~> Int, which is telling that Links does not respect shadowing of type names. Either we should prohibit duplicate declarations or support shadowing in the same scope.

The reason why the first Foo is picked over the second Foo) is due to refineBindings.ml.

jamescheney commented 5 years ago

Weirdly, if I do this one line at a time in the REPL, the right thing happens - i.e. you get an error

links> typename Foo = Int;
Foo = Int
links> typename Foo = Bool;
Foo = Bool
links> sig f : (Foo) {}~> Int
...... fun f(foo) { foo };
<stdin>:2: Type error: The non-recursive function definition has return type
    `Foo'
but its annotation has return type
    `Int'
In expression: fun f(foo) { foo }.
dhil commented 5 years ago

Weirdly, if I do this one line at a time in the REPL, the right thing happens - i.e. you get an error

Using the REPL the refineBindings transformation only operates on singleton lists (you can only type one binding at a time) , so, it can not reorder the bindings (that is what is happening now, because it conflates duplicate declarations).

dhil commented 5 years ago

I have a patch now that makes duplicate type names in same scope an error.

For example, for the following program

typename Foo = Int;
typename Foo = Float;

sig f : (Foo) {}~> Int
fun f(foo) { foo }

typename Foo = Bool;

Links now produces the following error

Syntax error: The type name Foo is declared multiple times.
  First declaration at file test.links:8:
    - typename Foo = Int
  Subsequent declaration at file test.links:10:
    - typename Foo = Float
  Subsequent declaration at file test.links:17:
    - typename Foo = Bool
SimonJF commented 5 years ago

Great, that sounds like a more sensible solution. Is this part of your larger nominal typename patch?

dhil commented 5 years ago

Great, that sounds like a more sensible solution. Is this part of your larger nominal typename patch?

I intend to separate them out. There is a little bit more work to be done on refineBindings. I intend to upstream that work first.

For value bindings I have adopted a similar rule, for example, the following program:

fun a() { println("a1") }
fun b() { a(); c() }
fun a() { println("a2") }
fun c() { a() }

fails to compile with the following error:

Syntax error: The function a is declared multiple times within the same binding group.
  First declaration at file test.links:19:
    - fun a() { println("a1") }
  Subsequent declaration at file test.links:21:
    - fun a() { println("a2") }

That is to ensure there is no ambiguity within binding groups (informally: a binding group is a sequence of syntactic similar bindings).

dhil commented 5 years ago

Note that the rule for recursive functions only applies on a binding group basis, so the following is a valid (and unambiguous):

fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
fun a() { println("a2") }

This program has three binding groups: [a, b, c], [x], and [a]. The two as are distinguishable because they belong to different (syntactic) binding groups.

jamescheney commented 5 years ago

Is there a particular reason to allow this? If

fun a() { println("a1") }
fun b() { a(); c() }
fun c() { a() }
var x = 42;
fun a() { println("a2") }

appears inside a module (that exports a) then it seems ambiguous which a is exported (I guess the last one wins). The added flexibility of being able to do this doesn't seem worth it relative to the potential for confusion.

slindley commented 5 years ago

The intended behaviour is to support shadowing. The behaviour of the bug you're seeing is unsurprising to me. I suspect that the list of type bindings is being reversed somewhere. Glancing at the code in refineBindings it looks like this is probably what's going on. When one uses fold_right to traverse and reconstruct a list the resulting list is reversed. This is accounted for (with a comment!) in the refineBindings function, but not subsequently. I strongly suspect this is a regression introduced by the desugaring of mutually recursive types.

Take home message: always think hard about whether it's necessary to reverse a list generated by List.fold_right and if in doubt you probably should.

yallop commented 5 years ago

+1 for shadowing

Regarding the reversal: something's odd, since fold_right preserves the order (i.e. fold_right cons - [] is the identity).

jamescheney commented 5 years ago

Yes, the point of reversing after fold is to use List.fold_left which is tail recursive. If you're not dealing with super long lists then use List.fold_right and don't reverse.

slindley commented 5 years ago

@yallop oops yes. Brain fart. Everything I said would have made sense if I'd been talking about List.fold_left.

jstolarek commented 5 years ago

I suspect that the list of type bindings is being reversed somewhere.

For the record, there's one production in the parser that produces a list in reversed order:

handle_params:
| rev(separated_nonempty_list(COMMA,
      separated_pair(logical_expression, RARROW, pattern)))    { $1 }
slindley commented 5 years ago

@jamescheney Yes, there is a reason to allow the above code. If we disallowed it then the semantics of the REPL would unnecessarily diverge from that of whole programs. For instance, it is nice that we can write

  fun a() { println("a1") }
  fun b() { a(); c() }
  fun c() { a() }
  var x = 42;

followed by

  fun a() { println("a2") }

in the REPL, and that it behaves the same as:

  fun a() { println("a1") }
  fun b() { a(); c() }
  fun c() { a() }
  var x = 42;
  fun a() { println("a2") }
jamescheney commented 5 years ago

OK. I think in (Oca)ML there is the same behavior but the fact that you have to explicitly write "and" in between mutually recursive functions makes it clearer what is going on and easier to tell if you are rearranging code in a binding-preserving way With the Links design, if I comment out the "var x = 42" line, or inadvertently move the last "fun a" line above the "var x" line without noticing the earlier "var a", I (currently don't but should) get an error. In general, if we want shadowing / order-sensitive binding, then making the structure of mutually recursive function groups explicit in the syntax seems desirable, otherwise the Haskell-style rule where the order of declarations does not matter and duplicating names in the same scope is not allowed should be followed, and we should have a workaround to allow rebinding in the REPL, like Haskell also does.

Also. in the REPL, you can't actually combine a function group and a "var" declaration group in a single "line" anyway:

links> fun a() { println("a1") }
......   fun b() { a(); c() }
......   fun c() { a() }
......   var x = 42;
*** Parse error: <stdin>:4

    var x = 42;

So I'm not entirely persuaded that this is an important use case since we already can't do it.

jstolarek commented 5 years ago

If we disallowed it then the semantics of the REPL would unnecessarily diverge from that of whole programs.

From a perspective of some who writes a compiler this is indeed a complication. But from a perspective of a programmer that actually uses the language I have always found this kind of shadowing to be a major nuisance. So I concur with @jamescheney that:

making the structure of mutually recursive function groups explicit in the syntax seems desirable, otherwise the Haskell-style rule where the order of declarations does not matter and duplicating names in the same scope is not allowed should be followed, and we should have a workaround to allow rebinding in the REPL, like Haskell also does.

dhil commented 5 years ago

If we want to support proper shadowing, should we then make mutuality amongst bindings explicit in the syntax? We could adopt the ML convention, and introduce and as a keyword:

typename Tree(a)= [|Empty|Node:(a, Forest(a))|];
and Forest(a) = [|Nil|Cons:(Tree(a), Forest(a))|];

fun even(n) {
  if (n == 0) true
  else odd(n-1)
} and odd(n) {
  if (n == 0) false
  else even(n-1)
}

Then there is question about how we deal syntactically with mutual recursive functions that have an explicit signature, we could possibly stack them at the beginning of the mutual bindings:

sig odd : (Int) ~> Bool
sig even : (Int) ~> Bool
fun even(n) {
  if (n == 0) true
  else odd(n-1)
} and odd(n) {
  if (n == 0) false
  else even(n-1)
}

Note, currently a signature must immediately precede its implementation.

jstolarek commented 5 years ago

Note, currently a signature must immediately precede its implementation.

That sounds like a restriction that could possibly be loosened. It would also be nice to be able to group together signatures for several bindings of the same type, a'la Haskell:

eqString_RDR, unpackCString_RDR, unpackCStringFoldr_RDR,
    unpackCStringUtf8_RDR :: RdrName
eqString_RDR            = nameRdrName eqStringName
unpackCString_RDR       = nameRdrName unpackCStringName
unpackCStringFoldr_RDR  = nameRdrName unpackCStringFoldrName
unpackCStringUtf8_RDR   = nameRdrName unpackCStringUtf8Name
dhil commented 5 years ago

That sounds like a restriction that could possibly be loosened.

Yep, I believe that would require a dedicated node for signatures in the AST. As of writing, signatures are attached to their implementations by the parser. Due to our current representation of names in the frontend and our (intended) shadowing semantics, it is not sufficient to build up a simple map from names to signatures during parsing.

jamescheney commented 5 years ago

This issue has grown arms and legs. I've started a separate issue #457 for discussion of whether/how to alter the syntax and handling of mutually recursive functions, which would be an enhancement rather than bugfix (i.e. there is no urgency to resolve #457 as part of this issue).

jamescheney commented 5 years ago

(just to be clear, I know it's my fault that the issue grew arms and legs, I should probably have started the separate issue for the letrec discussion earlier).