hkust-taco / mlscript

The MLscript programming language. Functional and object-oriented; structurally typed and sound; with powerful type inference. Soon to have full interop with TypeScript!
https://hkust-taco.github.io/mlscript
MIT License
175 stars 27 forks source link

Fix name duplication in code generation for new definition typing #197

Closed chengluyu closed 11 months ago

chengluyu commented 11 months ago

We might notice a strange behavior of code generation: if we declare a symbol x and then define it, the generated JavaScript function name will have a numeric postfix x1, which indicates that some other x was shadowed. But this is the very first time where x is defined in this file.

diff --git a/shared/src/test/diff/nu/CtorSubtraction.mls b/shared/src/test/diff/nu/CtorSubtraction.mls
--- a/shared/src/test/diff/nu/CtorSubtraction.mls
+++ b/shared/src/test/diff/nu/CtorSubtraction.mls
@@ -9,11 +9,11 @@
 fun x: ('a & Object) -> ('a \ Cls)
 fun x = case
   Cls then error
   y then y
 //│ fun x: forall 'b. (Cls | Object & 'b & ~#Cls) -> 'b
 //│ fun x: forall 'a. (Object & 'a) -> ('a & ~Cls)

 x : Int -> Int
 //│ Int -> Int
 //│ res
-//│     = [Function: x1] // <---- Oops?
+//│     = [Function: x]

Well, this is caused by an imperfect workaround for translating classes in new definition typing. Consider the following code.

let f x = x + 1
class Foo(value: Int) {
  let y = f of value
}

Code generation lifts class definitions (e.g., Foo) and translates them into a typing unit class before translating top-level let bindings (e.g., f). So, the Scope should contains some sort of stub symbols of top-level let bindings before translating classes. The current implementation does not handle this well, as it just declares ValueSymbol in the first place.

https://github.com/hkust-taco/mlscript/blob/231d9ec5106db29f45b4be86cae514d55ce07995/shared/src/main/scala/mlscript/JSBackend.scala#L1404-L1412

Then, after the translation of the typing unit class, when translating the top-level let bindings, the symbols will be declared twice and the runtime will be renewed, thus the runtime name comes with a numeric postfix. This sometimes cause runtime errors of undefined references. For example,

fun foo: Int -> Int
fun foo = x => x + 1
class Bar {
  fun calc(x) = foo(x)
}
//│ fun foo: Int -> Int
//│ class Bar {
//│   constructor()
//│   fun calc: Int -> Int
//│ }
//│ fun foo: Int -> Int

foo
//│ Int -> Int
//│ res
//│     = [Function: foo1] <--- Note that `foo` has runtime name `foo1`

:re
(new Bar()).calc(0)
//│ Int
//│ res
//│ Runtime error:
//│   ReferenceError: foo is not defined <--- But the class thinks `foo` is `foo`.

This PR fixes this problem by adding a Boolean field forNewDefsDryRun to ValueSymbol, which indicates if this symbol is declare in the situation as described above. If new symbols are shadowing this symbol, we will reuse the runtime name rather than allocating new names.

NeilKleistGao commented 11 months ago

Related PRs: #167, #186

chengluyu commented 11 months ago

I agree. I think the original approach (declare top-level let bindings, and unregister them if there's any error) is imperfect and hacky. So, this is actually a hacky fix to address the imperfectness of previous approach and will be soon removed by the help of PreTyper.

I will try your example and add it to tests.

chengluyu commented 11 months ago

The test cases you suggested work.