rsepassi / zigcoro

A Zig coroutine library
BSD Zero Clause License
85 stars 6 forks source link

SIGSEGV when stack overflow #15

Closed iacore closed 7 months ago

rsepassi commented 7 months ago

Interesting, so it looks like this is actually because of a type-incompatibility between *anyopaque and null that doesn't seem to be caught by the compiler.

This more minimal thing fails:

// Signature change
fn coroFnImpl(x: *usize, _: *anyopaque) usize {

// Call change
var frame = try libcoro.xasync(coroFnImpl, .{ &x, null}, stack);

While this works:

// Signature change
fn coroFnImpl(x: *usize, _: ?*anyopaque) usize {

// Call change
var frame = try libcoro.xasync(coroFnImpl, .{ &x, null}, stack);

The only difference between the 2 being: *anyopaque vs ?*anyopaque.

I'll take a look later on how to get this to error at compile time.

Let me know if you find that I've missed or misunderstood something.

iacore commented 7 months ago

Sorry. That's not where I found the bug.

Here's where I found the bug:

git clone https://codeberg.org/iacore/kanren.zig/ --branch zigcoro
cd kanren.zig
zig build # build test binary
zig-out/bin/test # SIGSEGV here

I don't know how to reduce this test case. The crash happened when restoring registers.

$ lldb zig-out/bin/test

(lldb) target create "zig-out/bin/test"
Current executable set to '/home/user/computing/lib/zig/kanren/zig-out/bin/test' (x86_64).
(lldb) r
Process 21654 launched: '/home/user/computing/lib/zig/kanren/zig-out/bin/test' (x86_64)
Test [6/6] test.sudoku_4x4... Process 21654 stopped
* thread #1, name = 'test', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7dc3cc8)
    frame #0: 0x000000000022953b test`main.unify(subst=<unavailable>, _l=<unavailable>, _r=<unavailable>) at main.zi
g:106
   103  };
   104  
   105  // always allocate new subst (if not null)
-> 106  pub fn unify(subst: SubstitutionMap, _l: Term, _r: Term) !?SubstitutionMap {
   107     const l: Term = subst.lookup(_l);
   108     const r: Term = subst.lookup(_r);
   109     // std.log.warn("unify: {} {}", .{ l.*, r.* });
(lldb) di -f -m

** 106  pub fn unify(subst: SubstitutionMap, _l: Term, _r: Term) !?SubstitutionMap {

test`:
    0x229530 <+0>:    pushq  %rbp
    0x229531 <+1>:    movq   %rsp, %rbp
    0x229534 <+4>:    subq   $0x830, %rsp              ; imm = 0x830 
->  0x22953b <+11>:   movq   %r8, -0x7d8(%rbp)
    0x229542 <+18>:   movq   %rcx, -0x800(%rbp)
    0x229549 <+25>:   movq   %rdx, %rax
    0x22954c <+28>:   movq   -0x800(%rbp), %rdx
    0x229553 <+35>:   movq   %rax, -0x7e0(%rbp)
    0x22955a <+42>:   movq   %rsi, -0x7f8(%rbp)
    0x229561 <+49>:   movq   %rdi, -0x7f0(%rbp)
    0x229568 <+56>:   movq   %rdi, -0x7e8(%rbp)

   104  
   105  // always allocate new subst (if not null)
rsepassi commented 7 months ago

Ah ok. First thing to try is increasing the coroutine’s stack size to some high number just to see if this is an uncaught stack overflow. Have you tried that?

iacore commented 7 months ago

it is indeed stack overflow

rsepassi commented 7 months ago

Ok cool. Unfortunately this is not catchable in the general case. Sometimes we get lucky and it is detectable and you get a stack overflow error but other times you just get a crash and need to know to adjust the stack size.

iacore commented 7 months ago

It's not possible to catch normal stack overflow in Zig?

rsepassi commented 7 months ago

Normally it is, but these userspace coroutines use their own stacks and the compiler isn’t able to reason about them the same way. This of course will all be fine once Zig ships native async and zigcoro will (for the most part) not be necessary.

On Sun, Nov 19, 2023 at 4:40 AM iacore @.***> wrote:

It's not possible to catch normal stack overflow in Zig?

— Reply to this email directly, view it on GitHub https://github.com/rsepassi/zigcoro/issues/15#issuecomment-1817843324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIQMW2I5H66ROJGMUBYMCDYFH44LAVCNFSM6AAAAAA7G56KW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXHA2DGMZSGQ . You are receiving this because you modified the open/close state.Message ID: @.***>

iacore commented 7 months ago

I'm curious. How does Zig catch stack overflow?

rsepassi commented 7 months ago

The compiler can compute the stack size needed per function/branch and knows how big the stack is. I think. Maybe ask more in the zig discord.