hkust-taco / mlscript

The MLscript programming language. Functional and object-oriented; structurally typed and sound; with powerful type inference. Soon to have full interop with TypeScript!
https://hkust-taco.github.io/mlscript
MIT License
175 stars 27 forks source link

New quasiquote implementation #182

Closed NeilKleistGao closed 9 months ago

NeilKleistGao commented 1 year ago

Adapted from #164.

LPTK commented 10 months ago

It would be nice if this gave a more informative error:

fun foo =
  let bar = 1
  code"bar"
//│ ╔══[ERROR] identifier not found: bar
//│ ║  l.242:     code"bar"
//│ ╙──                ^^^

But only if making it so does not require complicating the logic that tracks these things too much.

LPTK commented 10 months ago

Re: lexing. Ok, after seeing your code it seems my proposal wasn't that good. I was proposing to lex code"2 + 2" as two tokens QQ and "2 + 2", where the latter just uses the normal string lexing function. But it seems you had to write a separate qstr function instead, which kind of defeats the purpose... And seeing how complicated this makes the following code, as in:


        case (c: QQCODE, l0) :: (LITVAL(StrLit(s)), l1) :: rest => {
          // * Since we use a fresh NewLexer with a different Origin object to tokenize quoted code,
          // * we must recover the Origin object and location offset to get correct error information
          def recoverLocs(tokens: Ls[Stroken -> Loc])(implicit offset: Int): Ls[Stroken -> Loc] = tokens match {
            case (b @ BRACKETS(k, contents), l0) :: rest =>
              BRACKETS(k, recoverLocs(contents))(Loc(b.innerLoc.spanStart + offset, b.innerLoc.spanEnd + offset, origin)) ->
                Loc(l0.spanStart + offset, l0.spanEnd + offset, origin) :: recoverLocs(rest)
            case (s, l0) :: rest => s -> Loc(l0.spanStart + offset, l0.spanEnd + offset, origin) :: recoverLocs(rest)
            case Nil => Nil
          }
          val fph = new FastParseHelpers(s)
          val nuOrigin = Origin(origin.fileName, origin.startLineNum, fph)
          val lexer = new NewLexer(nuOrigin, raise, dbg)
          val tokens = recoverLocs(lexer.bracketedTokens)(l1.spanStart)
          go(rest, false, stack, BRACKETS(Quasiquote, tokens)(l1) -> Loc(l0.spanStart, l1.spanEnd + c.quotations.length, origin) :: acc)
        }

and seeing that it still displays some weirdly incorrect locations, then I think this is clearly not worth it. The previous code seemed better! Sorry for the needless churn.

NeilKleistGao commented 9 months ago

~This doesn't work and I don't see why:~

class Pair[A, B](fst: A, snd: B)
//│ class Pair[A, B](fst: A, snd: B)

x `=> `Pair(y `=> x, x)
//│ ╔══[ERROR] Type mismatch in application:
//│ ║  l.39:  x `=> `Pair(y `=> x, x)
//│ ║               ^^^^^^^^^^^^^^^^^
//│ ╟── code fragment of type `Code[forall ?A ?B. (fst: A, snd: B) -> Pair[?A, ?B], nothing]` is not a function
//│ ║  l.39:  x `=> `Pair(y `=> x, x)
//│ ╙──             ^^^^^
//│ Code[anything -> nothing, nothing]

EDIT: never mind, the correct way is:

x `=> `Pair`(y `=> x, x)
//│ Code[forall 'a 'B 'A. ('a & 'B) -> Pair['A, 'B], nothing]
//│   where
//│     'A :> anything -> 'a

Note: this also doesn't work, but it's probably fine for now:

x `=> `[y `=> x, x]
//│ ╔══[PARSE ERROR] This quote syntax is not supported yet
//│ ║  l.12:  x `=> `[y `=> x, x]
//│ ╙──             ^
//│ TEST CASE FAILURE: There was an unexpected parse error
//│ ╔══[PARSE ERROR] Not a recognized type
//│ ║  l.12:  x `=> `[y `=> x, x]
//│ ╙──               ^^^^^^^
//│ TEST CASE FAILURE: There was an unexpected parse error
//│ ╔══[ERROR] Type application syntax is not yet supported
//│ ║  l.12:  x `=> `[y `=> x, x]
//│ ╙──       ^^^^^^^^^^^^^^^^^^^
//│ TEST CASE FAILURE: There was an unexpected type error
//│ ╔══[ERROR] Type mismatch in unquote:
//│ ║  l.12:  x `=> `[y `=> x, x]
//│ ║                            ^
//│ ╙── undefined literal of type `()` is not an instance of type `Code`
//│ Code[anything -> error, nothing]

I created a new issue #211 to track all unsupported quasiquote syntaxes so that I can implement them one by one in the following PR.

LPTK commented 9 months ago
fun foo =
  `let x = `1
  mut let v = x
  `let x = `true
  x
//│ ╔══[PARSE ERROR] Unexpected 'mut' keyword in expression position
//│ ║  l.293:     mut let v = x
//│ ╙──           ^^^

Note that these work fine when the lets are not quoted.

class Ref[A](init: A) { mut val value: A = init }
//│ class Ref[A](init: A) {
//│   val value: A
//│ }

// FIXME
:p
fun foo =
  `let x = `1
  let v = Ref(x)
  `let x = `true
  set v.value = x
  x
//│ |#fun| |foo| |#=|→|`|#let| |x| |#=| |`|1|↵|#let| |v| |#=| |Ref|(|x|)|↵|`|#let| |x| |#=| |`|true|↵|#set| |v|.value| |#=| |x|↵|x|←|
//│ AST: TypingUnit(List(NuFunDef(None,Var(foo),None,List(),Left(Blk(List(Quoted(Let(false,Var(x),Unquoted(Quoted(IntLit(1))),Unquoted(Let(false,Var(v),App(Var(Ref),Tup(List((None,Fld(_,Var(x)))))),Quoted(Let(false,Var(x),Unquoted(Quoted(Var(true))),Unquoted(Assign(Sel(Var(v),Var(value)),Var(x))))))))), Var(x)))))))
//│ Parsed: fun foo = {code"Let(false,Var(x),Unquoted(Quoted(IntLit(1))),Unquoted(Let(false,Var(v),App(Var(Ref),Tup(List((None,Fld(_,Var(x)))))),Quoted(Let(false,Var(x),Unquoted(Quoted(Var(true))),Unquoted(Assign(Sel(Var(v),Var(value)),Var(x))))))))"; x};
//│ ╔══[ERROR] Type mismatch in unquote:
//│ ║  l.312:     set v.value = x
//│ ║             ^^^^^^^^^^^^^^^
//│ ╙── assignment of type `()` is not an instance of type `Code`
//│ ╔══[WARNING] Expression in statement position should have type `()`.
//│ ╟── Use a comma expression `... , ()` to explicitly discard non-unit values, making your intent clearer.
//│ ╟── Type mismatch in quasiquote:
//│ ║  l.309:     `let x = `1
//│ ║                      ^^
//│ ║  l.310:     let v = Ref(x)
//│ ║           ^^^^^^^^^^^^^^^^
//│ ║  l.311:     `let x = `true
//│ ║           ^^^^^^^^^^^^^^^^
//│ ║  l.312:     set v.value = x
//│ ║           ^^^^^^^^^^^^^^^^^
//│ ╙── code fragment of type `Code[forall ?a. ?a, ?b | ?c]` does not match type `()`
//│ ╔══[ERROR] identifier not found: x
//│ ║  l.313:     x
//│ ╙──           ^
//│ fun foo: error

By the way, above, the pretty-printing of the term (the part after "//│ Parsed:" seems broken as it shows plain ASTs. We should probably update this to correctly pretty-print the terms.

LPTK commented 9 months ago
NeilKleistGao commented 9 months ago
  • [ ] There are some parsing weirdnesses with let bindings:
fun foo =
  `let x = `1
  mut let v = x
  `let x = `true
  x
//│ ╔══[PARSE ERROR] Unexpected 'mut' keyword in expression position
//│ ║  l.293:       mut let v = x
//│ ╙──             ^^^

Note that these work fine when the lets are not quoted.

  • [ ] Another one:
class Ref[A](init: A) { mut val value: A = init }
//│ class Ref[A](init: A) {
//│   val value: A
//│ }

// FIXME
:p
fun foo =
  `let x = `1
  let v = Ref(x)
  `let x = `true
  set v.value = x
  x
//│ |#fun| |foo| |#=|→|`|#let| |x| |#=| |`|1|↵|#let| |v| |#=| |Ref|(|x|)|↵|`|#let| |x| |#=| |`|true|↵|#set| |v|.value| |#=| |x|↵|x|←|
//│ AST: TypingUnit(List(NuFunDef(None,Var(foo),None,List(),Left(Blk(List(Quoted(Let(false,Var(x),Unquoted(Quoted(IntLit(1))),Unquoted(Let(false,Var(v),App(Var(Ref),Tup(List((None,Fld(_,Var(x)))))),Quoted(Let(false,Var(x),Unquoted(Quoted(Var(true))),Unquoted(Assign(Sel(Var(v),Var(value)),Var(x))))))))), Var(x)))))))
//│ Parsed: fun foo = {code"Let(false,Var(x),Unquoted(Quoted(IntLit(1))),Unquoted(Let(false,Var(v),App(Var(Ref),Tup(List((None,Fld(_,Var(x)))))),Quoted(Let(false,Var(x),Unquoted(Quoted(Var(true))),Unquoted(Assign(Sel(Var(v),Var(value)),Var(x))))))))"; x};
//│ ╔══[ERROR] Type mismatch in unquote:
//│ ║  l.312:       set v.value = x
//│ ║               ^^^^^^^^^^^^^^^
//│ ╙── assignment of type `()` is not an instance of type `Code`
//│ ╔══[WARNING] Expression in statement position should have type `()`.
//│ ╟── Use a comma expression `... , ()` to explicitly discard non-unit values, making your intent clearer.
//│ ╟── Type mismatch in quasiquote:
//│ ║  l.309:       `let x = `1
//│ ║                        ^^
//│ ║  l.310:       let v = Ref(x)
//│ ║             ^^^^^^^^^^^^^^^^
//│ ║  l.311:       `let x = `true
//│ ║             ^^^^^^^^^^^^^^^^
//│ ║  l.312:       set v.value = x
//│ ║             ^^^^^^^^^^^^^^^^^
//│ ╙── code fragment of type `Code[forall ?a. ?a, ?b | ?c]` does not match type `()`
//│ ╔══[ERROR] identifier not found: x
//│ ║  l.313:       x
//│ ╙──             ^
//│ fun foo: error

By the way, above, the pretty-printing of the term (the part after "//│ Parsed:" seems broken as it shows plain ASTs. We should probably update this to correctly pretty-print the terms.

It is just because a `let ... requires an expression following the in keyword.

fun foo =
  let x = ...
  mut let y = ... // ok, because both `let`s are statements.

To write mut statements inside quoted let bindings, we need to use block:

fun foo =
  `let x = `1 in
    mut let v = x
    `let x = `true
    x
//│ fun foo: Code[true, nothing]

Similar reasons for the second case. I just fixed the pretty-printing of qq.

NeilKleistGao commented 9 months ago
  • [ ] I don't think this should type check. Indeed, at runtime it will produce an ill-scoped program:

    x `=>
    let v = Ref(x)
    let _ = y `=>
      set v.value = y
      `0
    v.value
    //│ Code[forall 'a. 'a -> 'a, nothing]

    I wonder what's going on here. Any ideas?

It is probably related to the Ref class and its behaviors. Here are two similar cases:

let foo =
  let r = Ref(`0)
  x `=> ((set r.value = x), `0), r.value
//│ let foo: Code[0, nothing]

let foo =
  mut let r = `0
  x `=> ((set r = x), `0), r
//│ let foo: Code[0, nothing] | Var['a, in ??x out ??x0]

:e
run(foo)
//│ ╔══[ERROR] Type error in application
//│ ║  l.121:   run(foo)
//│ ║           ^^^^^^^^
//│ ╟── type variable `?x` leaks out of its scope
//│ ╟── into type `nothing`
//│ ╟── adding a type annotation to any of the following terms may help resolve the problem
//│ ╟── • this definition:
//│ ║  l.116:     mut let r = `0
//│ ╙──                   ^^^^^^
//│ 0 | error
LPTK commented 9 months ago

It is just because a `let ... requires an expression following the in keyword.

Ok but there's no reason that this should be the case. We should be able to write the code I wrote above. And we should not get the really weird and broken parse shown in one of my example, where the let rhs somehow includes the next lines in the current block.

LPTK commented 9 months ago

It is probably related to the Ref class and its behaviors. Here are two similar cases:

At, that's probably an instance of unsound distributivity (which shouldn't be used in the presence of mutation). Can you try with :DontDistributeForalls

NeilKleistGao commented 9 months ago

It is probably related to the Ref class and its behaviors. Here are two similar cases:

At, that's probably an instance of unsound distributivity (which shouldn't be used in the presence of mutation). Can you try with :DontDistributeForalls

Seems the same:

:DontDistributeForalls

class Ref[A](init: A) { mut val value: A = init }
//│ class Ref[A](init: A) {
//│   val value: A
//│ }

let foo =
  let r = Ref(`0)
  x `=> ((set r.value = x), `0), r.value
//│ let foo: Code[0, nothing]

let foo =
  mut let r = `0
  x `=> ((set r = x), `0), r
//│ let foo: Code[0, nothing] | Var['a, in ??x out ??x0]
NeilKleistGao commented 9 months ago

It is just because a `let ... requires an expression following the in keyword.

Ok but there's no reason that this should be the case. We should be able to write the code I wrote above. And we should not get the really weird and broken parse shown in one of my example, where the let rhs somehow includes the next lines in the current block.

I suggest that we force requiring an in keyword for quoted let bindings so that we will not get this confusing result above. `let x = `1 alone doesn't make sence to me. We should either provide the rhs or use let x = `1 here.

LPTK commented 9 months ago

`let x = `1 alone doesn't make sence to me

It does to me. Look, you can always convert

let x = a in
  log(x)
  x

to

let x = a
log(x)
x

Why couldn't one convert

`let x = a in
  `log`(x)
  x

to

`let x = a
`log`(x)
x

using the exact same logic?

Anyway, for now I'd be fine with the kludge of forcing in for quoted let bindings. But then I'd rather make it `in, making it more obvious that it can't be omitted.

In any case, the parser should be fixed so the RHS of a let binding doesn't parse past the newline.

NeilKleistGao commented 9 months ago

`let x = `1 alone doesn't make sence to me

It does to me. Look, you can always convert

let x = a in
  log(x)
  x

to

let x = a
log(x)
x

Why couldn't one convert

`let x = a in
  `log`(x)
  x

to

`let x = a
`log`(x)
x

using the exact same logic?

  • [ ] By the way, neither works at the moment.

The first two cases are not equivalent. Notice that in the first case let x ... is an expression and in the second one that is a statement:

:NewDefs

let a = 42
//│ let a: 42
//│ a
//│   = 42

let x = a in // * expression
  log(x)
  x
//│ 42
//│ res
//│     = 42
//│ // Output
//│ 42

:e
x
//│ ╔══[ERROR] identifier not found: x
//│ ║  l.18:    x
//│ ╙──         ^
//│ error
//│ Code generation encountered an error:
//│   unresolved symbol x

let x = a // * statement
log(x)
x
//│ let x: 42
//│ 42
//│ x
//│   = 42
//│ res
//│     = undefined
//│ // Output
//│ 42
//│ res
//│     = 42

x
//│ 42
//│ res
//│     = 42

let x = a alone is handled by the following logic and has different semantics:

case ModifierSet(mods, (KEYWORD(kwStr @ ("fun" | "val" | "let")), l0) :: c) =>

That's why I said `let x = aalone didn't make sense because we do not care about quoted statements.

The third case works and just warns that `log`(x) is an expression and should have type nothing, or we should use a comma here.

Anyway, for now I'd be fine with the kludge of forcing in for quoted let bindings. But then I'd rather make it `in, making it more obvious that it can't be omitted.

In any case, the parser should be fixed so the RHS of a let binding doesn't parse past the newline.

Yes `in is a better choice.

LPTK commented 9 months ago

let x = a alone is handled by the following logic and has different semantics:

case ModifierSet(mods, (KEYWORD(kwStr @ ("fun" | "val" | "let")), l0) :: c) =>

That's why I said `let x = aalone didn't make sense because

You're saying it "doesn't make sense" from the implementation perspective. You have to take a step back and consider the high-level semantic perspective. It makes perfect sense for users to write this code.

we do not care about quoted statements.

Why would you say that? I do care to write the code I provided above. It happened when I was actually trying to write a real example. Why do you say you don't care about that? It's kind of rude TBH.

The third case works and just warns that `log`(x) is an expression and should have type nothing

Unit, not nothing...

LPTK commented 9 months ago

The first two cases are not equivalent.

They ARE equivalent. I don't care whether things are statements or expressions under the hood. I just care about user-facing syntax and semantics. And you should too.

LPTK commented 9 months ago