sustrik / libdill

Structured concurrency in C
MIT License
1.68k stars 155 forks source link

testsuite core dumps on FreeBSD #63

Closed bapt closed 7 years ago

bapt commented 7 years ago

Hi, I have been tested libdill on FreeBSD (to make a port out of it)

Running the testsuite everything is ok but the "fd" example which dies with a: "Assert failed: !dill_slist_empty(&ctx->ready) (cr.c:405)"

I'm testing on FreeBSD current the compiler being clang 3.9.0 (note that I have tried enabling and disabling SSP)

raedwulf commented 7 years ago

Is it doing the same with GCC?

bapt commented 7 years ago

yes

bapt commented 7 years ago

hum note that it does not fail on FreeBSD 11 release so it is probably an issue on freebsd side directly I'll dig more and close if not a libdill problem

raedwulf commented 7 years ago

What version of FreeBSD were you testing? It could also be something to do with the thread local storage (i.e. something they may have fixed in FreeBSD 11?). That has to have OS-support as well.

sustrik commented 7 years ago

Also, are you running from master or packaged 1.0 version? I've done some changes that may have caused that error in past couple of days.

sustrik commented 7 years ago

Current master doesn't fail on FreeBSD 10.3 (clang 3.4.1).

avsej commented 7 years ago

I can confirm that FreeBSD 11.0-RELEASE-p5 with gcc 4.9.4 does not fail on make check, but I see similar failure with ./tutorial/step4:

Assert failed: !dill_qlist_empty(&ctx->ready) (cr.c:409)
Abort trap (core dumped)

Also make check on dsock fails with the same error

FAIL: tests/keepalive
=====================

msend(6, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(12, 0x4b, 1)
Assert failed: !dill_qlist_empty(&ctx->ready) (cr.c:409)
FAIL tests/keepalive (exit status: 134)
avsej commented 7 years ago

gdb does not show anything past dill_wait()

#0  0x0000000800d4355a in thr_kill () from /lib/libc.so.7
#1  0x0000000800d4352b in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x0000000800d43499 in abort () at /usr/src/lib/libc/stdlib/abort.c:65
#3  0x0000000800a76f1c in dill_wait () at cr.c:409
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

ctx in in dill_wait() equals 0xffffffffffffffe8

avsej commented 7 years ago

In the example

#include <libdill.h>
#include <dsock.h>
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

coroutine void dialogue(int s) {
    int rc = msend(s, "What's your name?", 17, -1);
    if (rc != 0) {
      goto cleanup;
    }
    char inbuf[256];
    ssize_t sz = mrecv(s, inbuf, sizeof(inbuf), 1000);
    if (sz < 0) {
      goto cleanup;
    }
    printf("name: %s\n", inbuf);

  cleanup:
    rc = hclose(s);
    assert(rc == 0);
}

int main(int argc, char *argv[]) {
  int port = 5555;
  if (argc > 1) {
    port = atoi(argv[1]);
  }

  ipaddr addr;
  int rc = ipaddr_local(&addr, NULL, port, 0);
  assert(rc == 0);
  int ls = tcp_listen(&addr, 10);
  if (ls < 0) {
    perror("Can't open listening socket");
    return 1;
  }

  while (1) {
    int s = tcp_accept(ls, NULL, -1);
    assert(s >= 0);
    printf("New connection!\n");
    s = crlf_start(s);
    assert(s >= 0);
    go(dialogue(s));
  }

  return 0;
}

it fails right at return from dialogue() function

sustrik commented 7 years ago

It seems that the full stacks are visible when compiled with DILL_ARCH_FALLBACK. @raedwulf: Any idea why the asm implementation may confuse the debugger?

raedwulf commented 7 years ago

I'm not sure - I have included the dwarf callback information so it should work (as it does on cygwin and linux). I'm looking into it right now on my FreeBSD box.

sustrik commented 7 years ago

I've seen trimmed stacks even on Linux. After recompillation with DILL_ARCH_FALLBACK the full stack was visible. The stack always ends with dill_wait().

raedwulf commented 7 years ago

There must be a bug somewhere in the .cfi_offset I've specified - I'm looking into it

avsej commented 7 years ago

In dill_wait everything seems good before

397             if(!dill_qlist_empty(&ctx->ready)) {

image

but after that, ctx already equals 1 and next step just segfaults image

The application has only one thread. What could go wrong there?

raedwulf commented 7 years ago

Just to be clear, there's two issues here:

sustrik commented 7 years ago

I think dill_qlist_empty is a red herring. What's important is that there's setjmp just before that. We are probably seeing a bad interaction between particular implementation of pthreads and libdill's split stacks. (For example: If pthreads accesses thread-local vars by jumping up the stack until it reaches the top, it just won't work.)

To test the theory: Compile with --disable-threads and check whether the problem persists.

raedwulf commented 7 years ago

@sustrik I'm still seeing this with --disable-threads but I need to run --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK" to double check everything

sustrik commented 7 years ago

@raedwulf: Do you mean issue 1 or issue 2? Can we move the debugging thing into a separate issue to avoid confusion?

avsej commented 7 years ago

I just noticed that it is using poll instead of kqueue, is that correct?

raedwulf commented 7 years ago

That's not correct - although poll shouldn't error either (but that's a 3rd issue)...

raedwulf commented 7 years ago

Okay let's split this up (I'll create a new one for the stack-trimming problem)

sustrik commented 7 years ago

There's already issue for the poll thing (https://github.com/sustrik/libdill/issues/43) and it happens also on Linux.

That being said, why is it using poll in the first place? On freebsd is should use kqueue.

avsej commented 7 years ago

I recompiled libdill with --disable-threads, the problem still there

sustrik commented 7 years ago

That's actually good news. Problem with pthread implementation would be probably harder to fix.

raedwulf commented 7 years ago

I'm getting more test-suite fails on dsock with --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK" than with the assembly implementation. I'm on FreeBSD 10.3.

avsej commented 7 years ago

@sustrik my FreeBSD does not define BSD macro :) can we test for __FreeBSD__? I can create separate PR

avsej commented 7 years ago

Or I can add define to kqueue feature test in configure.ac

raedwulf commented 7 years ago

Does that fix the issue?

raedwulf commented 7 years ago

Yes please make a PR.

avsej commented 7 years ago

checking for issue and will do PR

avsej commented 7 years ago

wow, it doesn't even compile with kqueue. I will fix that and add to PR

raedwulf commented 7 years ago

Yeah! I just noticed that! It appears that some aspects of kqueue on FreeBSD and kqueue on OS X must be different.

raedwulf commented 7 years ago

Is it me or the entire build is defaulting to poll because neither DILL_KQUEUE or DILL_EPOLL are defined anywhere???

It defines DILL_NO_KQUEUE and DILL_NO_EPOLL ... not the otherway round :P

avsej commented 7 years ago

they are not supposed to be defined, but rather left for user to override. The system supposed to autodetect everything in the second section.

BTW the PR is https://github.com/sustrik/libdill/pull/65

avsej commented 7 years ago

@sustrik switching to kqueue is fixing this particular issue. I guess this implicitly proves issues with linux when it falls back to poll

raedwulf commented 7 years ago

they are not supposed to be defined, but rather left for user to override.

This would imply that the .travis.yml build script is wrong... @sustrik ?

raedwulf commented 7 years ago

@avsej are the dsock tests passing for you?

avsej commented 7 years ago

libdill tests are passing (they were passing before fix also). but dsock tests are failing in a different way:

FAIL: tests/tcp
===============

Assertion failed: (ls >= 0), function main, file tests/tcp.c, line 71.
FAIL tests/tcp (exit status: 134)

FAIL: tests/pfx
===============

Assertion failed: (ls >= 0), function main, file tests/pfx.c, line 58.
FAIL tests/pfx (exit status: 134)

FAIL: tests/crlf
================

Assertion failed: (ls >= 0), function main, file tests/crlf.c, line 58.
FAIL tests/crlf (exit status: 134)

And my example works

raedwulf commented 7 years ago

Yes that's the exact same test failing for me. Oh sorry, I only have crlf failing for me... the others work.

avsej commented 7 years ago

Sorry, they are also green. Forgot to kill example on port 5555 ;)

avsej commented 7 years ago

So to recap observations: 1) switching to kqueue fixes the ctx assertion reported in this ticket 2) poll implementation seems to be buggy and segfaults when linux or freebsd falls back to it

sustrik commented 7 years ago

Ok, so as far as I understand, we can close this ticket and keep #43 for the POLL issue.

Or is there anything else remaining to be fixed on FreeBSD?

avsej commented 7 years ago

FreeBSD is good now. The ticket could be closed

bapt commented 7 years ago

Building the latest master on freebsd current I don't have anymore the issue when building with clang (3.9.0) but I still have the issue with gcc 4.9.4 the exact same issue

raedwulf commented 7 years ago

Could you test with ./configure --disable-threads and ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector"?

bapt commented 7 years ago

With gcc: ./configure 1 failure ./configure --disable-threads == same failure ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector :

make  check-TESTS
Segmentation fault (core dumped)
FAIL: tests/example
Segmentation fault (core dumped)
FAIL: tests/go
Segmentation fault (core dumped)
FAIL: tests/fd
PASS: tests/handle
Segmentation fault (core dumped)
FAIL: tests/chan
Segmentation fault (core dumped)
FAIL: tests/choose
Segmentation fault (core dumped)
FAIL: tests/sleep
Segmentation fault (core dumped)
FAIL: tests/signals
Segmentation fault (core dumped)
FAIL: tests/overload
PASS: tests/rbtree

With clang ./configure all OK ./configure --disable-threads all OK ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector all ok

raedwulf commented 7 years ago

Can you create a new issue with this; I'll work on it after I fix #64 to help debugging this issue.

bapt commented 7 years ago

done in #67