roc-lang / roc

A fast, friendly, functional language.
https://roc-lang.org
Universal Permissive License v1.0
4.19k stars 299 forks source link

Effect Interpretter: `roc check` stack overflow #5464

Open bhansconnect opened 1 year ago

bhansconnect commented 1 year ago

Just setup basic effect interpreter changes for: https://github.com/roc-lang/basic-cli

The branch is effect-interpreter-2.

try roc check examples/form.roc. output is:

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
[1]    59798 abort      roc check examples/form.roc

The example is pretty minimal but in a few files. All important files can be seen in this commit: https://github.com/roc-lang/basic-cli/commit/410d22ed3c353e510cfaf5e96ba0c7c6b416b152

ayazhafiz commented 1 year ago

@bhansconnect could you please cut down the reproduction to the minimal example possible? In a standalone location that does not need the whole basic-cli.

bhansconnect commented 1 year ago

Will try to do later tonight or tomorrow

bhansconnect commented 1 year ago

Here is a single file repro. Just run roc check on it.

Added 2 mains cause I think the second main may point to the root of the issue. Roc seems to think anything returning a Task is "a weird self-referential type".

interface PleaseTypeCheck
    exposes [main, succeed, fail, await, outLine, inLine]
    imports []

Op : [
    StdoutLine Str ({} -> Op),
    StdinLine (Str -> Op),
    Done,
]

Task ok err := (Result ok err -> Op) -> Op

succeed : ok -> Task ok *
succeed = \ok -> @Task \toNext -> toNext (Ok ok)

fail : err -> Task * err
fail = \err-> @Task \toNext -> toNext (Err err)

await : Task ok1 err, (ok1 -> Task ok2 err) -> Task ok2 err
await = \@Task fromResult, next ->
    continue <- @Task
    result <- fromResult
    @Task inner =
        when result is
            Ok v -> next v
            Err e -> fail e

    inner continue

outLine : Str -> Task {} *
outLine = \str ->
    toNext <- @Task
    {} <- StdoutLine str
    toNext (Ok {})

inLine : Task Str *
inLine =
    toNext <- @Task
    str <- StdinLine
    toNext (Ok str)

main : Task {} []
main =
    _ <- await (outLine "What's your first name?")
    firstName <- await inLine
    _ <- await (outLine "What's your last name?")
    lastName <- await inLine
    outLine "Hi, \(firstName) \(lastName)! 👋"

# Note: if you just use a direct main, it seems that outLine and inLine have issues.
# Roc thinks they are infinitely recursive.
# Just comment out the main above and check this main instead.
# Seems specifically relate to the function specifying its type.
# main : Task {} []
# main = outLine "Test"
bhansconnect commented 1 year ago

@ayazhafiz, Any chance you will get a chance to look at this some time soon?

ayazhafiz commented 1 year ago

Will take a look today

ayazhafiz commented 1 year ago

I've been taking a look at this but have hit somewhat of a wall. I think we'll need to do some unit tests to figure out exactly the problem.

I think to start we should reduce this example much further, to something that does succeed in inference but whose type doesn't seem to make sense.

# +opt infer:print_only_under_alias
# +opt can:allow_errors

interface Test exposes [inLine] imports []

Op : [
    A ({} -> Op),
    EOF,
]

Task : ({} -> Op) -> Op

inLine : Task
inLine =
#^^^^^^{-1}
    \toNext -> A (\{} -> toNext {})

uitest says the type of inLine here is

({} -||-> [A ({} -|Lam7 ({} -||-> ([A ({} -a-> b), EOF] as b)) as a|-> b), EOF] as b)
-|inLine|-> ([A ({} -|Lam7 ({} -||-> ([A ({} -a-> b), EOF]c as b)) as a|-> b), EOF] as b)

(I've taken the liberty of cleaning up unnecessary unspecialized lambda sets and extension variables, which don't matter here.)

Note that the capture of the lambda set has a return type that is the recursive Op, the same as the return type of the function the lambda set is a part of. This is all expected, except that the inner return type is not a recursion pointer - it is the unrolled recursive tag union!

({} -||-> [A ({} -|Lam7 ({} -||-> ([A ({} -a-> b), EOF] as b)) as a|-> b), EOF] as b)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^ this should be just "b"!

That is a problem, because the occurs-checker then sees that the lambda set in that position is part of a cycle (indeed it is, but only if b is unrolled in that position - see below). But, really that lambda should not be recursive - the only recursive construct in this procedure is Op, and the lambda is recursive only that is captured by the recursive type Op. It is clear to see that the stack function toNext does not actually capture itself, although the Op type is recursively composed of it.

This is plenty confusing for me, since this is pretty far from the surface syntax already. So let's take a step back and look at the inferred type we expect. Hopefully that will make things clearer.

Suppose we just had

inLine = \toNext -> A (\{} -> toNext {})

this has type

({} -||-> b) -|inline|-> [A ({} -|Lam7 ({} -||-> b)|-> b)]

And just the annotation

inline : Task

gives us the type

({} -||-> ([A ({} -||-> a), EOF] as a)) -||-> ([A ({} -||-> b), EOF] as b)

Great. Now let's trace through unification of these two

:  ({} -||1-> a)                             -|inLine|->  [A ({} -|Lam7 ({} -||1-> a)|3-> a)]
~  ({} -||4-> ([A ({} -||5-> c), EOF] as c)) -||->       ([A ({} -||8-> d), EOF] as d)

Here I've used -||1-> to say "this lambda set is named the variable 1".

We can now see that we have the following constraints:

a = c = ([A ({} -||5-> c), EOF] as c)
a = d = ([A ({} -||8-> d), EOF] as d)
d     = ([A ({} -|Lam7 ({} -||1-> a)|3-> a)])
||1 = ||4

putting that together we get the solution

a = c = d = ([A ({} -|Lam7 ({} -||14-> d)|583-> d), EOF] as d)

where -||583-> means this is a lambda set where 5, 8, 3 are now equivalent variables.

Okay, we can plug this back into the unification and see we end up with the type


:  ({} -||1-> a)                             -|inLine|->  [A ({} -|Lam7 ({} -||1-> a)|3-> a)]
~  ({} -||4-> ([A ({} -||5-> c), EOF] as c)) -||->       ([A ({} -||8-> d), EOF] as d)
=>
({} -||14-> ([A ({} -|Lam7 ({} -||14-> d)|583-> d), EOF] as d))
-|inLine|-> ([A ({} -|Lam7 ({} -||14-> d)|583-> d), EOF] as d)

This is what I expect - that Lam7 is not explicitly recursive in its lambda set, but is accounted for being a part of the recursive type d, because it is a part of the definition of d.

Let's look at the difference between this and the originally inferred type.

EXPECTED

({} -||-> ([A ({} -|Lam7 ({} -||-> d)|-> d), EOF] as d))

ACTUAL

({} -||-> ([A ({} -|Lam7 ({} -||-> ([A ({} -a-> b), EOF] as b)) as a|-> b), EOF] as b))

if we unroll the nested d in the type of the capture of Lam7 in the expected type we end up with

({} -||14-> ([A ({} -|Lam7 ({} -||14-> ([A ({} -|Lam7 ({} -||14-> d)|583-> d), EOF] as d))|583-> d), EOF] as d))

and now, indeed, the lambda set given by the equivalent variables 5/8/3 appears recursive - the occurs checker will see it as so, and try to mark it recursive, since 583 appears inside the captures of 583.

So, we need to repack nested recursive tag unions that appear under themselves to be recursion pointers, and not unrolled in this form. How exactly we get there is not totally clear to me yet.

I suspect this to be the main problem in this issue.