Closed derrickturk closed 1 year ago
@derrickturk have you had a chance to investigate this at all? I know from zulip you were going to look into it some.
If yes, any findings? If no, do you need assistance?
@derrickturk i believe we discussed some about this in zulip, can you bring over the relevant discussion points to this issue so they can help others in the future if anyone else tries to tackle this?
Sharing my notes here from last month after a long delay. I spent some time back in October poking at this after discussing some ideas on Zulip.
I've been testing this all with a debug build of ponyc
, in an Arch setup on WSL2:
~/.local/bin/bin/ponyc --version
0.51.3-2665ee7c [debug]
Compiled with: LLVM 14.0.3 -- Clang-14.0.6-x86_64
Defaults: pic=true
Executions of my test program (same as above) are ending in a segmentation fault most of the time, with the full expected output. Sometimes I get the full expected output with no segmentation fault; every once in a while I see no final output and no segmentation fault.
Segmentation faults appear in the tracing functions invoked via ponyint_actor_final
in turn invoked via the cycle checker:
Process 8020 stopped
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x000055555559f201 intpony`Array_I64_val_Trace + 1
intpony`Array_I64_val_Trace:
-> 0x55555559f201 <+1>: movq 0x18(%rsi), %rsi
0x55555559f205 <+5>: callq 0x5555555f85a0 ; pony_trace at trace.c:104
0x55555559f20a <+10>: popq %rax
0x55555559f20b <+11>: retq
(lldb) bt
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
* frame #0: 0x000055555559f201 intpony`Array_I64_val_Trace + 1
frame #1: 0x00005555555f85cb intpony`pony_trace(ctx=0x00007fffe68ac600, p=0x4000007fffe66cb8) at trace.c:105:3
frame #2: 0x000055555559f20a intpony`Array_I64_val_Trace + 10
frame #3: 0x00005555555f85cb intpony`pony_trace(ctx=0x00007fffe68ac600, p=0x00007fffe66cc801) at trace.c:105:3
frame #4: 0x000055555559f77c intpony`Array_u3_t2_$1$10_iso_$1$10_iso_collections__MapEmpty_val_collections__MapDeleted_val_Trace + 28
frame #5: 0x00005555555ef11b intpony`ponyint_actor_final(ctx=0x00007ffff7cbbc48, actor=0x00007fffe68ac600) at actor.c:859:5
frame #6: 0x00005555555f2b47 intpony`final(ctx=0x00007ffff7cbbc48, self=0x00007ffff7cbbe00) at cycle.c:893:7
frame #7: 0x00005555555f29d3 intpony`ponyint_cycle_terminate at cycle.c:1183:3
frame #8: 0x00005555555fee5f intpony`ponyint_sched_shutdown at scheduler.c:1143:3
frame #9: 0x00005555555fd2ef intpony`ponyint_sched_start(library=false) at scheduler.c:1330:5
frame #10: 0x00005555555ff6f0 intpony`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x00007fffffffe8d0) at start.c:342:7
frame #11: 0x00005555555ee3cb intpony`main + 267
frame #12: 0x00007ffff7ce1290 libc.so.6`___lldb_unnamed_symbol3141 + 128
frame #13: 0x00007ffff7ce134a libc.so.6`__libc_start_main + 138
frame #14: 0x000055555559da85 intpony`_start at start.S:115
With --ponynoblock
, I have not received any segmentation faults, but do sometimes see the final output fail to appear.
Following a suggestion on Zulip from @SeanTAllen, I looked into the possibility of an issue with "orphaned" actors. Setting a breakpoint/watch on pony_create
does not show any actors created with orphaned = false
. I was unable to find any other code pertinent to the orphaned-actor issue but I'm happy to poke in other places with a little direction.
I'm interested in helping find the underlying issue with the cycle-checker, but I'm also as interested or more in understanding why I see the early-termination behavior. It's possible I've misunderstood the quiescence rules. I am going to try to reduce my example program to something more minimal.
Well, I've reduced the program to reproduce this bug, or at least I think I have. The specific manifestation of the segmentation fault varied as I made changes. The current single-file test program is at the end of this post. The current behavior is:
The last backtrace I captured is:
(lldb) run
Process 15633 launched: '/home/derrick/intpony/intpony' (x86_64)
Process 15633 stopped
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fff0000008e)
frame #0: 0x0000555555568aaf intpony`ponyint_actor_final(ctx=0x00007ffff7cbbc48, actor=0x00007fffe6707200) at actor.c:858:25
855 ctx->current = actor;
856
857 // Run the actor finaliser if it has one.
-> 858 if(actor->type->final != NULL)
859 actor->type->final(actor);
860
861 // Run all outstanding object finalisers.
(lldb) bt
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fff0000008e)
* frame #0: 0x0000555555568aaf intpony`ponyint_actor_final(ctx=0x00007ffff7cbbc48, actor=0x00007fffe6707200) at actor.c:858:25
frame #1: 0x000055555556c4f7 intpony`final(ctx=0x00007ffff7cbbc48, self=0x00007ffff7cbbe00) at cycle.c:893:7
frame #2: 0x000055555556c383 intpony`ponyint_cycle_terminate at cycle.c:1183:3
frame #3: 0x0000555555577eff intpony`ponyint_sched_shutdown at scheduler.c:1143:3
frame #4: 0x000055555557638f intpony`ponyint_sched_start(library=false) at scheduler.c:1330:5
frame #5: 0x0000555555578790 intpony`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x00007fffffffe8b0) at start.c:342:7
frame #6: 0x0000555555567d67 intpony`main + 231
frame #7: 0x00007ffff7ce1290 libc.so.6`___lldb_unnamed_symbol3141 + 128
frame #8: 0x00007ffff7ce134a libc.so.6`__libc_start_main + 138
frame #9: 0x0000555555562795 intpony`_start at start.S:115
At one point while reducing the code, when I still had an Array[I64]
serving as an input queue for each Cpu
, I saw a segmentation fault in a different place:
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x50)
frame #0: 0x00005555555773cf intpony`final_small(chunk=0x00007fffe6b851c0, force_finalisers_mask=4294967295) at heap.c:196:5
193 p = chunk->m + (bit << HEAP_MINBITS);
194
195 // run finaliser
-> 196 pony_assert((*(pony_type_t**)p)->final != NULL);
197 (*(pony_type_t**)p)->final(p);
198
199 // clear bit just found in our local finaliser map
(lldb) bt
* thread #1, name = 'intpony', stop reason = signal SIGSEGV: invalid address (fault address: 0x50)
* frame #0: 0x00005555555773cf intpony`final_small(chunk=0x00007fffe6b851c0, force_finalisers_mask=4294967295) at heap.c:196:5
frame #1: 0x00005555555770f7 intpony`chunk_list(f=(intpony`final_small at heap.c:165), current=0x00007fffe6b851c0, mark=4294967295) at heap.c:377:5
frame #2: 0x0000555555577230 intpony`ponyint_heap_final(heap=0x00007fffe6c16a40) at heap.c:426:5
frame #3: 0x000055555556d248 intpony`ponyint_actor_final(ctx=0x00007ffff7cbbc48, actor=0x00007fffe6c16a00) at actor.c:862:3
frame #4: 0x0000555555570c67 intpony`final(ctx=0x00007ffff7cbbc48, self=0x00007ffff7cbbe00) at cycle.c:893:7
frame #5: 0x0000555555570af3 intpony`ponyint_cycle_terminate at cycle.c:1183:3
frame #6: 0x000055555557c66f intpony`ponyint_sched_shutdown at scheduler.c:1143:3
frame #7: 0x000055555557aaff intpony`ponyint_sched_start(library=false) at scheduler.c:1330:5
frame #8: 0x000055555557cf00 intpony`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x00007fffffffe8b0) at start.c:342:7
frame #9: 0x000055555556c477 intpony`main + 231
frame #10: 0x00007ffff7ce1290 libc.so.6`___lldb_unnamed_symbol3141 + 128
frame #11: 0x00007ffff7ce134a libc.so.6`__libc_start_main + 138
frame #12: 0x0000555555564ea5 intpony`_start at start.S:115
I have not seen the original segfault in the trace function for Array[I64]
since removing the "memory" array from the Cpu
actor.
At this point I am still wondering if this might be related to #1118, as well as how many distinct bugs I am seeing.
use "collections"
actor Main
new create(env: Env) =>
let mk = MaxKeeper
let waiter = CpuWaiter({()(env) =>
mk.query({(max: (I64 | None)) =>
match max
| let m: I64 =>
env.out.print("Problem 1: " + max.string())
else
env.err.print("No max")
end
})
})
// the larger this number is, the more likely a segfault
for iter in Range(0, 1000) do
var cpus = Array[Cpu]
for i in Range(0, 5) do
let cpu = Cpu
if i > 0 then
try
cpus(cpus.size() - 1)?.subscribe(cpu)
end
end
cpus.push(cpu)
end
try
cpus(cpus.size() - 1)?.subscribe(mk)
waiter.wait(cpus(cpus.size() - 1)?)
end
try
cpus(0)?.send(0)
end
for cpu in cpus.values() do
cpu.run()
end
end
waiter.start_waiting()
interface Sendable
fun tag send(word: I64)
actor Cpu is Sendable
var _sub: (Sendable tag | None) = None
var _on_halt: ({ref ()} iso | None) = None
var _blocked: Bool = false
var _step: U8 = 1
var _val: I64 = -1
be subscribe(rcvr: Sendable tag) =>
_sub = rcvr
be subscribe_halt(notify: {ref ()} iso) =>
_on_halt = consume notify
be step() =>
if _blocked then
return
end
match _step
| 1 =>
if _val >= 0 then
_step = 2
else
_blocked = true
end
| 2 =>
match _sub
| let rcvr: Sendable tag => rcvr.send(_val + 1)
end
_step = 3
| 3 =>
match _on_halt
| let rcvr: {ref ()} iso => rcvr()
end
_step = 4
end
if (not _blocked) and (_step < 4) then
step()
end
be run() =>
if not _blocked then
step()
end
be send(word: I64) =>
_val = word
if _blocked then
_blocked = false
step()
end
actor MaxKeeper
var _max: (I64 | None) = None
be send(word: I64) =>
_max = match _max
| let m: I64 => m.max(word)
else
word
end
be query(fn: {((I64 | None))} val) =>
fn(_max)
actor CpuWaiter
var _waiting: SetIs[Cpu tag] = SetIs[Cpu tag]
var _wait: Bool = false
let _when_done: {()} val
new create(when_done: {()} val) =>
_when_done = when_done
be wait(cpu: Cpu tag) =>
_waiting.set(cpu)
cpu.subscribe_halt({()(self = recover tag this end) =>
self._done(cpu)
})
be start_waiting() =>
_wait = true
if _waiting.size() == 0 then
_when_done()
end
be _done(cpu: Cpu tag) =>
_waiting.unset(cpu)
if _wait and (_waiting.size() == 0) then
_when_done()
end
I should also add that --ponynoblock
still seems to remove "segmentation fault" from the set of possible outcomes, but doesn't change the termination behavior.
@derrickturk can you open an issue for "early termination", that sounds like a distinctly different problem for the probably cycle detector related segfault. I haven't paid a lot of attention to what you have reported for it, a good fresh summation on its own would be helpful.
I have a hard time reproducing the segfault issue using the last example. I have a very easy time reproducing the "early termination" issue. I almost never get output.
I have a much easier time reproducing all three cases with the original example in the zip file.
Using the code on my cd-contacted-bad
branch, i am unable with either example to reproduce the segfault which I was hopeful would be the case as I fixed all race conditions I could find in interactions with the cycle detector.
@derrickturk very much, please do open a new issue for the termination problem and limit the information there-in to it.
So the "early termination" problem:
there's no printing going on as at the time the program ends, the waiting count is still rather high. And it is "all over the place".
Here's some output from my modified version.
Next up is to determine if bug in pony code or the runtime. I'm assuming runtime but I want to rule out the pony code first.
@redvers suggested during office hours to try running with --ponymaxthreads=1 and lo and behold, no "early termination". Why? Still unknown.
OK so as stated, the segfault is fixed by forthcoming changes.
The "early termination" is a "bug" in the program. The program works with a single ponythread because the message ordering with a single thread and how things are scheduled matches the sequential thinking behind the code. However, with more than 1 thread, the code is no longer sequential and invariants that the code doesn't state do not hold.
In particular there is an expectation that:
However this will only be true with a single thread where you happen to have our scheduling rules (which with a single thread will satisfy most expectations for sequential program order).
However, with more than 1 thread, things can get "out of order" from what the program seems to expect to happen. In particular, at the moment that:
cpus(0)?.send(0)
is first called, our first cpu starts its "steps processing" almost immediately. With a single thread this is not the case and everything is "ordered as expected".
When the first cpu "starts its processing", then there are orderings where _wait can be called to add messages to the cpu set, and the cpu in question will already reached step 3 before it has received and processed the "subscribe_halt" message. This means that there will no longer be a 1 to 1 wait to done _ratio. What happens then is that when the cpu is at step 3, the on_halt matches fails because _on_halt is still None and no _done message is sent which means that this check will not pass:
be _done(cpu: Cpu tag) =>
_waiting.unset(cpu)
if _wait and (_waiting.size() == 0) then
_when_done()
end
because _waiting.size() will never be able to reach 0.
For the program to work as expected the following that only happens with 1 thread MUST happen:
within the for iter in Range(0, 1000) do
loop... we need this to happen:
then we have:
for cpu in cpus.values() do
cpu.run()
end
which means that while main is still running...
because of this ordering and corresponding orderings based on the scheduler implementation, there can never be the mismatch in set/unset calls in the waiter.
however, in a concurrent world, if we have 2 threads... the moment an actor hits scheduler 1's queue, it can be stolen by scheduler 2 and then, there are so many unknown orderings and all the race conditions in the code related to set/unset in the CpuWaiter can happen.
Note that with the deterministic scheduling with 1 thread, that 4000 times "_on_halt" won't be set, but that is ok, because it means that 1000 times it will be set and there's a corresponding 1000 wait calls that were made from main.
The easiest fix for the problem is to move the setting of on_halt to before the first time it needs to be used, so here in main:
try
cpus(cpus.size() - 1)?.subscribe(mk)
waiter.wait(cpus(cpus.size() - 1)?)
end
you have access to the waiter and can do the subscribe_halt call on the last item in cpus at that time and give it the waiter object that is in scope.
Opening apology: this is a much-reduced version of a solution to Advent of Code's 2019 day 7 puzzle, which is hard to explain in a vacuum but comes down to implementing a tiny VM with async I/O and running multiple instances of it in communication with each other. After cutting out a lot of extraneous things, I'm left with the attached source files and input. I'm reluctant to reduce it further because I'm starting to think the crash has nothing to do with the VM logic itself, but it's challenging to produce relevant input for a cut-down VM.
My notes refer to testing on both Windows 10 and Arch Linux via WSL2. In both cases ponyc is the latest release 0.51.2. This is a 4-core machine.
Run with:
./intpony input.txt
The three observed behaviors are:
The only runtime option I've found to have any effect on this is
--ponymaxthreads 1
, which on Windows seemingly guarantees the intended output (with fast exit).Compiled with
-d
, I get additional possible outcomes including mismatches between the count of RUN outputs and DONE outputs. (This should not happen given the input - each VM/Cpu should run to successful halt.)The program ends in a segfault, usually, on either the release or debug binary. It's often reported (by gdb) in
pony_os_peername
on Windows, and inArray_I64_val_Trace
on WSL2. Oddly, runs with no segfault also have no output, and successful runs produce output before segfaulting. With--ponymaxthreads 1
, no segfault on Windows, but I still get segfaults on WSL2.The test program creates 100
Cpu
actors total, each with a 519-"word" memory (i.e. anArray[I64]
with 519 entries). This can be adjusted; it seems that segfaults get more likely as the number goes up. I've never seen a segfault with only 1 or 2 actors, but I have with 4 or 5.Full stack trace from a crash on WSL2:
I've also seen:
The plot thickened with a suggestion on Zulip to run with
--ponynoblock
, disabling the cycle checker (IIUC). This resulted in "dropped output" about 50-75% of the time (rate maybe dependent on running under a debugger or not), but no segfaults.intpony.zip