Closed bapt closed 7 years ago
Is it doing the same with GCC?
yes
hum note that it does not fail on FreeBSD 11 release so it is probably an issue on freebsd side directly I'll dig more and close if not a libdill problem
What version of FreeBSD were you testing? It could also be something to do with the thread local storage (i.e. something they may have fixed in FreeBSD 11?). That has to have OS-support as well.
Also, are you running from master or packaged 1.0 version? I've done some changes that may have caused that error in past couple of days.
Current master doesn't fail on FreeBSD 10.3 (clang 3.4.1).
I can confirm that FreeBSD 11.0-RELEASE-p5 with gcc 4.9.4 does not fail on make check
, but I see similar failure with ./tutorial/step4
:
Assert failed: !dill_qlist_empty(&ctx->ready) (cr.c:409)
Abort trap (core dumped)
Also make check
on dsock fails with the same error
FAIL: tests/keepalive
=====================
msend(6, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(10, 0x4b, 1)
msend(7, 0x4b, 1)
mrecv(7, 0x4b, 1)
msend(12, 0x4b, 1)
Assert failed: !dill_qlist_empty(&ctx->ready) (cr.c:409)
FAIL tests/keepalive (exit status: 134)
gdb does not show anything past dill_wait()
#0 0x0000000800d4355a in thr_kill () from /lib/libc.so.7
#1 0x0000000800d4352b in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2 0x0000000800d43499 in abort () at /usr/src/lib/libc/stdlib/abort.c:65
#3 0x0000000800a76f1c in dill_wait () at cr.c:409
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
ctx
in in dill_wait()
equals 0xffffffffffffffe8
In the example
#include <libdill.h>
#include <dsock.h>
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
coroutine void dialogue(int s) {
int rc = msend(s, "What's your name?", 17, -1);
if (rc != 0) {
goto cleanup;
}
char inbuf[256];
ssize_t sz = mrecv(s, inbuf, sizeof(inbuf), 1000);
if (sz < 0) {
goto cleanup;
}
printf("name: %s\n", inbuf);
cleanup:
rc = hclose(s);
assert(rc == 0);
}
int main(int argc, char *argv[]) {
int port = 5555;
if (argc > 1) {
port = atoi(argv[1]);
}
ipaddr addr;
int rc = ipaddr_local(&addr, NULL, port, 0);
assert(rc == 0);
int ls = tcp_listen(&addr, 10);
if (ls < 0) {
perror("Can't open listening socket");
return 1;
}
while (1) {
int s = tcp_accept(ls, NULL, -1);
assert(s >= 0);
printf("New connection!\n");
s = crlf_start(s);
assert(s >= 0);
go(dialogue(s));
}
return 0;
}
it fails right at return from dialogue()
function
It seems that the full stacks are visible when compiled with DILL_ARCH_FALLBACK. @raedwulf: Any idea why the asm implementation may confuse the debugger?
I'm not sure - I have included the dwarf callback information so it should work (as it does on cygwin and linux). I'm looking into it right now on my FreeBSD box.
I've seen trimmed stacks even on Linux. After recompillation with DILL_ARCH_FALLBACK the full stack was visible. The stack always ends with dill_wait().
There must be a bug somewhere in the .cfi_offset
I've specified - I'm looking into it
In dill_wait
everything seems good before
397 if(!dill_qlist_empty(&ctx->ready)) {
but after that, ctx
already equals 1
and next step just segfaults
The application has only one thread. What could go wrong there?
Just to be clear, there's two issues here:
I think dill_qlist_empty is a red herring. What's important is that there's setjmp just before that. We are probably seeing a bad interaction between particular implementation of pthreads and libdill's split stacks. (For example: If pthreads accesses thread-local vars by jumping up the stack until it reaches the top, it just won't work.)
To test the theory: Compile with --disable-threads and check whether the problem persists.
@sustrik I'm still seeing this with --disable-threads
but I need to run --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK"
to double check everything
@raedwulf: Do you mean issue 1 or issue 2? Can we move the debugging thing into a separate issue to avoid confusion?
I just noticed that it is using poll instead of kqueue, is that correct?
That's not correct - although poll shouldn't error either (but that's a 3rd issue)...
Okay let's split this up (I'll create a new one for the stack-trimming problem)
There's already issue for the poll thing (https://github.com/sustrik/libdill/issues/43) and it happens also on Linux.
That being said, why is it using poll in the first place? On freebsd is should use kqueue.
I recompiled libdill with --disable-threads
, the problem still there
That's actually good news. Problem with pthread implementation would be probably harder to fix.
I'm getting more test-suite fails on dsock with --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK" than with the assembly implementation. I'm on FreeBSD 10.3.
@sustrik my FreeBSD does not define BSD
macro :) can we test for __FreeBSD__
? I can create separate PR
Or I can add define to kqueue
feature test in configure.ac
Does that fix the issue?
Yes please make a PR.
checking for issue and will do PR
wow, it doesn't even compile with kqueue. I will fix that and add to PR
Yeah! I just noticed that! It appears that some aspects of kqueue on FreeBSD and kqueue on OS X must be different.
Is it me or the entire build is defaulting to poll
because neither DILL_KQUEUE
or DILL_EPOLL
are defined anywhere???
It defines DILL_NO_KQUEUE and DILL_NO_EPOLL ... not the otherway round :P
they are not supposed to be defined, but rather left for user to override. The system supposed to autodetect everything in the second section.
BTW the PR is https://github.com/sustrik/libdill/pull/65
@sustrik switching to kqueue is fixing this particular issue. I guess this implicitly proves issues with linux when it falls back to poll
they are not supposed to be defined, but rather left for user to override.
This would imply that the .travis.yml build script is wrong... @sustrik ?
@avsej are the dsock tests passing for you?
libdill tests are passing (they were passing before fix also). but dsock tests are failing in a different way:
FAIL: tests/tcp
===============
Assertion failed: (ls >= 0), function main, file tests/tcp.c, line 71.
FAIL tests/tcp (exit status: 134)
FAIL: tests/pfx
===============
Assertion failed: (ls >= 0), function main, file tests/pfx.c, line 58.
FAIL tests/pfx (exit status: 134)
FAIL: tests/crlf
================
Assertion failed: (ls >= 0), function main, file tests/crlf.c, line 58.
FAIL tests/crlf (exit status: 134)
And my example works
Yes that's the exact same test failing for me. Oh sorry, I only have crlf
failing for me... the others work.
Sorry, they are also green. Forgot to kill example on port 5555 ;)
So to recap observations: 1) switching to kqueue fixes the ctx assertion reported in this ticket 2) poll implementation seems to be buggy and segfaults when linux or freebsd falls back to it
Ok, so as far as I understand, we can close this ticket and keep #43 for the POLL issue.
Or is there anything else remaining to be fixed on FreeBSD?
FreeBSD is good now. The ticket could be closed
Building the latest master on freebsd current I don't have anymore the issue when building with clang (3.9.0) but I still have the issue with gcc 4.9.4 the exact same issue
Could you test with ./configure --disable-threads
and ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector"
?
With gcc: ./configure 1 failure ./configure --disable-threads == same failure ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector :
make check-TESTS
Segmentation fault (core dumped)
FAIL: tests/example
Segmentation fault (core dumped)
FAIL: tests/go
Segmentation fault (core dumped)
FAIL: tests/fd
PASS: tests/handle
Segmentation fault (core dumped)
FAIL: tests/chan
Segmentation fault (core dumped)
FAIL: tests/choose
Segmentation fault (core dumped)
FAIL: tests/sleep
Segmentation fault (core dumped)
FAIL: tests/signals
Segmentation fault (core dumped)
FAIL: tests/overload
PASS: tests/rbtree
With clang ./configure all OK ./configure --disable-threads all OK ./configure --disable-threads CFLAGS="-DDILL_ARCH_FALLBACK -fno-stack-protector all ok
Can you create a new issue with this; I'll work on it after I fix #64 to help debugging this issue.
done in #67
Hi, I have been tested libdill on FreeBSD (to make a port out of it)
Running the testsuite everything is ok but the "fd" example which dies with a: "Assert failed: !dill_slist_empty(&ctx->ready) (cr.c:405)"
I'm testing on FreeBSD current the compiler being clang 3.9.0 (note that I have tried enabling and disabling SSP)