Closed joshi-monster closed 2 months ago
Some extra context from some preliminary debugging: https://roc.zulipchat.com/#narrow/stream/395097-compiler-development/topic/spooky.20action.20at.20a.20distance/near/405401898
Maybe the example with "abc" was not ideal; When using longer strings (long enough to definitely not be small-string optimized), it always drops 2 characters after the first iteration, and then also has an empty string at the end.
So
# note the second and last element; all other strings are as expected
# the same as the previous, but with the first char dropped
expect prefixes "abc...xyz" [] == ["abc...xyz", "c...xyz", ..., "xyz", "yz", "z", ""]
Which seems to indicate that after the first iteration (probably because it copied it to be able to mutate), it inserts into the list first, but then in-place mutates the string?
"abc...xyz"
This is still a small string. Has to be at least 24 characters to be a large string.
The ...
was meant to represent an arbitrary number or characters; sorry if that wasn't clear.
In practice, I used all lowercase letters, uppercase letters, and digits (so Str.countGraphemes == 62
)
Ah ok. makes sense. The large string case doesn't surprise me that it breaks. The small string case also breaking is what really scares me. Small strings are never shared or reference counted (just copied), yet will still see the sudden change in data. On top of that, first scans of the LLVM IR looked just fine. I probably need to step through the assembly and work backwards from that.
Self contained repro with some handy dbg
s sprinkled in. Just run with roc test Bug.roc
interface Bug
exposes []
imports []
prefixes = \str, soFar ->
dbg (str, soFar)
if Str.isEmpty str then
soFar
else
graphemes = Str.graphemes str
dbg (str, graphemes)
remaining = List.dropFirst graphemes 1
dbg (str, remaining)
next = Str.joinWith remaining ""
dbg (str, next)
prefixes next (List.append soFar str)
# this is what I would have expected
expect
out = prefixes "abc" []
out == ["abc", "bc", "c"] # fails
Output:
────────────────────────────────────────────────────────────────────────────────
[/tmp/Bug.roc:5] (str, soFar) = ("abc", [])
[/tmp/Bug.roc:11] (str, graphemes) = ("abc", ["a", "b", "c"])
[/tmp/Bug.roc:13] (str, remaining) = ("abc", ["b", "c"])
[/tmp/Bug.roc:15] (str, next) = ("abc", "bc")
[/tmp/Bug.roc:5] (str, soFar) = ("bc", ["abc"])
[/tmp/Bug.roc:11] (str, graphemes) = ("bc", ["b", "c"])
[/tmp/Bug.roc:13] (str, remaining) = ("bc", ["c"])
[/tmp/Bug.roc:15] (str, next) = ("c", "c")
[/tmp/Bug.roc:5] (str, soFar) = ("c", ["abc", "c"])
[/tmp/Bug.roc:11] (str, graphemes) = ("c", ["c"])
[/tmp/Bug.roc:13] (str, remaining) = ("c", [])
[/tmp/Bug.roc:15] (str, next) = ("", "")
[/tmp/Bug.roc:5] (str, soFar) = ("", ["abc", "c", ""])
── EXPECT FAILED ──────────────────────────────────────────────── /tmp/Bug.roc ─
This expectation failed:
20│> # this is what I whould have expected
21│> expect
22│> out = prefixes "abc" []
23│> out == ["abc", "bc", "c"] # fails
When it failed, these variables had these values:
out : List Str
out = ["abc", "c", ""]
1 failed and 0 passed in 510 ms.
The failure is clearly seen here: [/tmp/Bug.roc:15] (str, next) = ("c", "c")
Suddenly, after calling Str.joinWith remaining ""
for the second time, str
has lost a character.
So far really not sure what is going on. The llvm ir that I have read looks fine. Debug prints from zig look fine as well. The string is small and being copied so it shouldn't be some sort of referencing issue. We are passing strings around by reference technically, so the bug could be related to that somehow and accidentally mutating it, but I'm not convinced.
Bug manifests after Str.joinWith, but I am not fully sure it is the cause....hard to root cause this.
Replaced Str.graphemes
with Str.toUtf8
and some extra mapping and still hit the error, so that isn't the cause.
Since you helped me get the debugger running, I thought I might as well run it on this :smile:
I've stared at the assembly and the IR for some time now, and I think I might have finally found something?
List.append %joinpointarg1 %joinpointarg
%joinpointarg1
is the initial (empty) list, and %joinpointarg
is the passed string%joinpointarg1
refers instead to the result of that List.append
, and %joinpointarg
is the value of a temporary local variable %result_value
instead%result_value
is used to store the result of Str.joinWith
, so that is next
in rocstr
, but on every following iteration, it instead appends the result of the next
instead!But I'm also really tired and I've looked at this for way too long. Maybe I'm missing something here.
So on the first iteration, it correctly appends str, but on every following iteration, it instead appends the result of the next instead!
When it recurses here prefixes next (List.append soFar str)
, it should be setting str to next. So I think that joinpoint behavior is correct.....
Oh, it is a double use of result value, that is definite the issue. So when we loop, we overwrite joinpointarg
when writing to result_value
Thanks so much for finding this 🙏
Putting this in our own ir:
procedure : `Bug.prefixes` List Str
procedure = `Bug.prefixes` (`#Derived_gen.IdentId(29)`: Str, `#Derived_gen.IdentId(30)`: List Str):
joinpoint `Bug.20` `Bug.str` `Bug.soFar`:
inc `Bug.soFar`;
inc `Bug.str`;
let `Bug.30` : {Str, List Str} = Struct {`Bug.str`, `Bug.soFar`};
let `Bug.9` : Str = CallByName `Inspect.toStr` `Bug.30`;
dbg `Bug.9`;
dec `Bug.9`;
let `Bug.28` : Int1 = CallByName `Str.isEmpty` `Bug.str`;
if `Bug.28` then
dec `Bug.str`;
ret `Bug.soFar`;
else
inc 3 `Bug.str`;
let `Bug.graphemes` : List Str = CallByName `Str.graphemes` `Bug.str`;
inc `Bug.graphemes`;
let `Bug.27` : {Str, List Str} = Struct {`Bug.str`, `Bug.graphemes`};
let `Bug.8` : Str = CallByName `Inspect.toStr` `Bug.27`;
dbg `Bug.8`;
dec `Bug.8`;
let `Bug.26` : U64 = 1i64;
let `Bug.remaining` : List Str = CallByName `List.dropFirst` `Bug.graphemes` `Bug.26`;
inc `Bug.remaining`;
let `Bug.25` : {Str, List Str} = Struct {`Bug.str`, `Bug.remaining`};
let `Bug.7` : Str = CallByName `Inspect.toStr` `Bug.25`;
dbg `Bug.7`;
dec `Bug.7`;
let `Bug.24` : Str = "";
let `Bug.next` : Str = CallByName `Str.joinWith` `Bug.remaining` `Bug.24`;
dec `Bug.24`;
dec `Bug.remaining`;
inc `Bug.next`;
let `Bug.23` : {Str, Str} = Struct {`Bug.str`, `Bug.next`};
let `Bug.6` : Str = CallByName `Inspect.toStr` `Bug.23`;
dbg `Bug.6`;
dec `Bug.6`;
let `Bug.22` : List Str = CallByName `List.append` `Bug.soFar` `Bug.str`;
jump `Bug.20` `Bug.next` `Bug.22`;
in
jump `Bug.20` `#Derived_gen.IdentId(29)` `#Derived_gen.IdentId(30)`;
This write let
Bug.next: Str = CallByName
Str.joinWith`Bug.remaining
Bug.24
;`
On the second iteration of this function and beyond is also writing to Bug.str
.
This jump Bug.20 Bug.next Bug.22;
makes both Bug.str
and Bug.next
the exact same allocation.
So I think Bug.next
can't writing to an alloca that is hoisted above the joinpoint. It needs to write to a local alloca then at the last minute right before the jump Bug.20 Bug.next Bug.22;
, we need to copy over the local to the longer living alloca.
@folkertdev:
I think this line is the root of the bug: https://github.com/roc-lang/roc/blob/f795d0856a8a68747aa7bc975827bd02ff8d4610/crates/compiler/gen_llvm/src/llvm/scope.rs#L94
We need to wait and only write to these right as a jump is called. We can not write to them early. By inserting them into symbols here, they get written to too early.
actually, that might not quite be the root issue. Might just need to add an extra copy to each arg here: https://github.com/roc-lang/roc/blob/f795d0856a8a68747aa7bc975827bd02ff8d4610/crates/compiler/gen_llvm/src/llvm/build.rs#L3416-L3418
but something around these in general.
in either case the alloca must be outside of the loop otherwise the stack grows on every iteration. I do think the problem is that the "next" variable is never realized in this case. We do have some logic somewhere that tries to elide copies from one alloca into another, maybe that is at play here?
Oof I just took the time to minimize a miscompilation I encountered in Advent of Code and I think it's a duplicate of this:
interface Issue6139
exposes [buggy]
imports []
expect
buggyAnswer = buggy "A" []
buggyAnswer == ["A", "B", "C", "D"]
buggy = \node, seen ->
if List.contains seen node then
seen
else
dbg node # node = "B"
nextNode = stepNode node
dbg node # node = "C"
buggy nextNode (List.append seen node)
stepNode = \node ->
when node is
"A" -> "B"
"B" -> "C"
"C" -> "D"
"D" -> "A"
_ -> crash ""
I also posted this on the chat, but I think actually making an issue sounds like a good idea :slightly_smiling_face:
I tried to make this example simpler by using lists instead of strings, but could not reproduce it with a lists version. So it seems to maybe also be related to the specifics of how strings are handled? Unfortunately I have have no idea about roc's internals to make a good guess what might be happening here, since I just started actually writing roc code.
If it helps, I could repro this with todays nighly build and an old one I still had from 11/28. It also did not make a difference whether I used
basic-cli
0.6.2 or 0.7.0.