ivmai / bdwgc

The Boehm-Demers-Weiser conservative C/C++ Garbage Collector (bdwgc, also known as bdw-gc, boehm-gc, libgc)
https://www.hboehm.info/gc/
Other
2.98k stars 407 forks source link

Support GC_init (and GC_get_stack_base) from non-main thread on FreeBSD #180

Closed dimpase closed 7 years ago

dimpase commented 7 years ago

There are warnings such as "FreeBSD does not yet fully support threads with Boehm GC" issued while building on FreeBSD, but what exactly is not supported cannot be found, at least I couldn't found anything concrete online.

I am attempting to port to FreeBSD a Python (CPython, to be precise) extension that uses Boehm GC, and face rather mysterious crashes of the GC in a threaded setting.

Having a bit more documentation/examples on this would be great.

Edit: the warnings were removed in a recent (April 2017) commit 1e40b9b0494d1dd2d289f71799e1d5d6dec4774f

dimpase commented 7 years ago

On the topic, the most recent I have found is this thread - dated April 2012, that stops inconclusively, it seems.

ivmai commented 7 years ago

To @magv , @paurkedal : could you reply, please? To: @dimpase : do you observe the crashes with the latest master?

dimpase commented 7 years ago

It's gc-7.6.0 with the libatomic_ops-7.4.4; although more or less identical crashes are observed with gc-7.2. It also does not seem to matter whether it's built with clang-3.8, clang 5.0, or with gcc6.

dimpase commented 7 years ago

the platform is FreeBSD 11.0-RELEASE-p1 #0 r306420, x86_64, running in a VMWare on a Linux box.

paurkedal commented 7 years ago

@dimpase Can you check if the thread_stress_test from https://github.com/paurkedal/bdwgc/tree/thread_stress_test also crashes? This is a version of disclam_test where I removed the disclaim notifiers, to check whether those are involved.

dimpase commented 7 years ago

All the tests from that branch pass. (Built with gcc6, without libatomic_ops).

However, if used in the application in question, this branch gives essentially the same segfault as I saw earlier. In a nutshell, depending on the thread (good in the "main" CPython thread, crashes if it's coming from a thread responsible for tab completion) it is launched from, GC either initialises and works (at least in part, there are more crashes coming later, again depending upon the thread), or crashed at initialisation (launched from the embedded Lisp interpreter ECL), like this:

Program received signal SIGSEGV, Segmentation fault.
GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1608
1608    extra/../mark.c: No such file or directory.
(gdb) bt
#0  GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1608
#1  0x000000087eb2b8ad in GC_push_all_stacks () at extra/../pthread_stop_world.c:573
#2  0x000000087eb27cb5 in GC_mark_some (cold_gc_frame=0x7fffdeff7e80 " \265\327~\b") at extra/../mark.c:427
#3  0x000000087eb2801d in GC_stopped_mark (stop_func=stop_func@entry=0x87eb195c0 <GC_never_stop_func>) at extra/../alloc.c:707
#4  0x000000087eb291ae in GC_try_to_collect_inner (stop_func=0x87eb195c0 <GC_never_stop_func>) at extra/../alloc.c:493
#5  0x000000087eb2a84d in GC_init () at extra/../misc.c:1290
#6  0x000000087e7cde80 in init_alloc () at /usr/home/dima/Sage/sage/local/var/tmp/sage/build/ecl-16.1.2.p4/src/src/c/alloc_2.d:786
#7  0x000000087e69622d in cl_boot (argc=argc@entry=1, argv=argv@entry=0x7fffdeff7fc8)
    at /usr/home/dima/Sage/sage/local/var/tmp/sage/build/ecl-16.1.2.p4/src/src/c/main.d:521
#8  0x000000087e20a728 in __pyx_pf_4sage_4libs_3ecl_4init_ecl (__pyx_self=<optimized out>) at build/cythonized/sage/libs/ecl.c:5417
....
#69 0x0000000800a809f3 in PyObject_Call (func=func@entry=, arg=arg@entry=, kw=<optimized out>) at Objects/abstract.c:2547
#70 0x0000000800b34937 in PyEval_CallObjectWithKeywords (func=, arg=, kw=<optimized out>) at Python/ceval.c:4221
#71 0x0000000800b80272 in t_bootstrap (boot_raw=0x8799a8440) at ./Modules/threadmodule.c:620
#72 0x0000000801066b55 in ?? () from /lib/libthr.so.3
...

(sorry --- a non-debugging build ATM). And this all works on Linux, Windows/cygwin64, and OSX.

ivmai commented 7 years ago

thread responsible for tab completion

Is this thread created by GC_pthread_create, or by pthread_create (not intercepted by GC) with a manual thread registration by GC_register_my_thread? In the latter case, is it manually unregistered?

Incremental mode is off (i.e. GC_enable_incremental is not called), right?

0 GC_push_all_eager (bottom=, top=) at extra/../mark.c:1608

Does the region bottom..top belong to the thread which is alive and stopped by GC?

Please compile GC with --enable-gc-assertions (if not yet).

dimpase commented 7 years ago

On Tue, Sep 26, 2017 at 5:32 PM, Ivan Maidanski notifications@github.com wrote:

thread responsible for tab completion

Is this thread created by GC_pthread_create, or by pthread_create (not intercepted by GC) with a manual thread registration by GC_register_my_thread? In the latter case, is it manually unregistered?

none of this. In a nutshell: we run CPython and load a Python extension, which has GC embedded. There are two scenarios:

1) loading happens at the main Python "thread", and in this case GC successfully initialises.

2) loading happens with a separate thread spawned by Python (line #71 in the backtrace) if one hits TAB to complete certain Python expression related to the Python extension, and in this case GC crashes, as shown on the backtrace.

That is, unless I miss something, there is nothing to register - or should one register the parent thread with the GC?

Incremental mode is off (i.e. GC_enable_incremental is not called), right?

Correct.

0 GC_push_all_eager (bottom=, top=) at extra/../mark.c:1608

Does the region bottom..top belong to the thread which is alive and stopped by GC?

hmm, is there any thread it must stop?

Please compile GC with --enable-gc-assertions (if not yet).

I will.

ivmai commented 7 years ago

none of this. In a nutshell: we run CPython and load a Python extension, which has GC embedded. hmm, is there any thread it must stop?

Is GC used from a single thread? Does the region bottom..top belong to this thread?

dimpase commented 7 years ago

On Tue, Sep 26, 2017 at 6:12 PM, Ivan Maidanski notifications@github.com wrote:

none of this. In a nutshell: we run CPython and load a Python extension, which has GC embedded. hmm, is there any thread it must stop?

Is GC used from a single-thread?

no, we don't have control over threads (plus the "main" thread, not sure this is right terminology) GC is initialised in and used in. Python has its own multithreading facility, which may use pthreads, so our extension may be started/used in several of these.

For some reason on Linux/OSX we don't need to register/de-register these threads, it "just works", but on FreeBSD it appears to be no way around it.

Does the region bottom..top belong to this thread?

By right, it should - this must be the memory area "belonging" to the ECL Lisp interpreter we are loading, and which in turn starts GC.

Is there a function call one can use to check this?

ivmai commented 7 years ago

For some reason on Linux/OSX we don't need to register/de-register these threads, it "just works"

GC tracks threads on OS X automatically. On Linux, pthread_create might be intercepted by GC. Please compile GC with -D DEBUG_THREADS (it will print "About to start new thread from thread" on every intercepted pthread_create).

Is there a function call one can use to check this?

I think no but -D DEBUG_THREADS should log the relevant information - grep for "Pushing stacks from thread ...", "About to start new thread from thread ..."

dimpase commented 7 years ago

Does this mean that in scenario 2) GC might be crashing due to an attempt to do a GC_push... on the main thread area, without registering it first?

ivmai commented 7 years ago

Does this mean that in scenario 2) GC might be crashing due to an attempt to do a GC_push... on the main thread area, without registering it first?

Maybe. Please What are the top/bottom values in GC_push_all_eager (bottom=, top=), what is the value of self and p->id in GC_push_all_stacks at crash? Was p->id registered (i.e. does the log contain "About to start new thread from thread id>")?

dimpase commented 7 years ago

Running scenario 2), I set breakpoint 1 at GC_push_all_eager and see the following; it crashes after the 2nd time the breakpoint is reached. (I've also set breakpoints at GC_push_all_stacks to see self, p, p->id, nthreads. no surprises here - nthreads=1) And no "About to start new thread from thread id>" message printed.

Creating thread 0x801e18300
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Starting mark helper for mark number 0
Stopping the world from 0x801e18300
World stopped from 0x801e18300

Breakpoint 1, GC_push_all_eager (bottom=0x7fffdf7f6c70 "\220l\177\337\377\177", top=0x7fffdf7f70d0 "\020q\177\337\377\177") at extra/../mark.c:1593
1593    extra/../mark.c: No such file or directory.
(gdb) cont
Continuing.
...
Breakpoint 3, GC_push_all_stacks () at extra/../pthread_stop_world.c:522
522     in extra/../pthread_stop_world.c
(gdb) p self
$1 = (pthread_t) 0x801e18300
...
Pushing stacks from thread 0x801e18300
...
Breakpoint 5 at 0x882bd2e96: file extra/../pthread_stop_world.c, line 566.
(gdb) cont
Continuing.
Stack for thread 0x801e18300 = [0x7fffdf7f7010,0x7ffffffff000)

Breakpoint 5, GC_push_all_stacks () at extra/../pthread_stop_world.c:566
566     in extra/../pthread_stop_world.c
(gdb) p p
$8 = (GC_thread) 0x882e21ee0 <first_thread>
(gdb) p nthreads
$9 = 1
(gdb) p p->id
$10 = (pthread_t) 0x801e18300
...
(gdb) p self
$12 = (pthread_t) 0x801e18300
(gdb) cont
Continuing.

Breakpoint 1, GC_push_all_eager (bottom=0x7fffdf7f7010 "pp\177\337\377\177", top=0x7ffffffff000 "\377T$\020\215D$ P\270\241\001") at extra/../mark.c:1593
1593    extra/../mark.c: No such file or directory.
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
GC_push_all_eager (bottom=0x7fffdf7f7010 "pp\177\337\377\177", top=0x7ffffffff000 "\377T$\020\215D$ P\270\241\001") at extra/../mark.c:1608
1608    in extra/../mark.c
(gdb) p p
$13 = (word *) 0x7fffdfffe000
ivmai commented 7 years ago

Stack for thread 0x801e18300 = [0x7fffdf7f7010,0x7ffffffff000)

The 2nd value is not correct, it's the upper boundary of the primordial thread (not the current one). It is obtained by GC_freebsd_main_stack_base which assumes the GC to be initialized from the main thread. On Linux, pthread_attr_getstack() is used instead.

ivmai commented 7 years ago

Aha, pthread_attr_get_np should be used in stead of pthread_getattr_np in GC_get_main_stack_base. Also pthread_attr_init(&attr) should be called before pthread_getattr_np. Please try to replace the above one, replace LINUX to FREEBSD in: if defined(LINUX) && !defined(NO_PTHREAD_GETATTR_NP)

ivmai commented 7 years ago

If this hack works for you, I will fix it in a portable way

dimpase commented 7 years ago

The hack does work, see https://github.com/paurkedal/bdwgc/pull/1 for the actual patch and more comments on the patch.

Are there in GC more thread-related differences between FreeBSD and Linux/OSX that I must be aware of?

Now I am hitting the next problem, seemingly related to GC_register_my_thread/GC_unregister_my_thread, which ECL people say must be always used (via the corresponding ECL wrappers), but we don't, and it works on Linux/OSX, but appears to be a cause of crashes on FreeBSD. Is there Linux/OSX-only code in GC which does automatic (un)registering of threads?

ivmai commented 7 years ago

Please test commit 49d7fe7 - it should solve GC_register_my_thread issue as well.

dimpase commented 7 years ago

Turns out that the current GC (7.7) also causes similar problems on Linux (my bad, I should have checked this earlier). I know that 7.6 passed all out tests, but already GC-7.6 with Sagemath on Linux is prone to a crash I trigger interactively. And with 7.7 tests fail...

Please bear with me for few days while I sort this out before we can get back to FreeBSD. In the worst case I'll try backporting your patches to GC 7.2, which we (Sagemath) currently still use, although AFAIK Debian already uses 7.4.4 with Sagemath they ship, so something is missing in the puzzle now.

So I'll be looking down at a chain of regressions from GC 7.2 to 7.4.4.to 7.6 to 7.7 :-(

ivmai commented 7 years ago

Please also test with release-7_2 and release-7_4 branches

dimpase commented 7 years ago

I already narrowed it down to between 7.4.4 (does crash---although I'll check with Debian people whether they saw this and are aware of this issue - perhaps they somehow manage to avoid it) and our lightly patched 7.2f, which does work.

Could you perhaps point out tags for (semi)functioning pre-releases between these two points? (Before resorting to git bisect ...).

ivmai commented 7 years ago
  1. Please try v7.4.4 with --disable-disclaim configure option.
  2. Check other 7.4.x versions: gc7_4_2, gc7_4_0, gc7_3alpha2
ivmai commented 7 years ago
  1. Also check gc7_2 (without suffix) if it works

I.e. is tags first (before resorting to bisect by commit) Unfortunately, even the difference between gc7_2 and gc7_3alpha2 is 10 KLOC (even ignoring spaces, scripts, c++ code, tools, tests).

ivmai commented 7 years ago

Does commit 49d7fe7 solve the initial issue (GC_init from non-main thread on FreeBSD)? If yes, I suggest to close this issue and open another one for the crash on linux.

dimpase commented 7 years ago

OK, great, thanks a lot, it appears that https://github.com/ivmai/bdwgc/commit/49d7fe77033f0bdabd50a2912cd58effd85452c1 applied as patch to gc-7.6.0 produces GC working in our application on FreeBSD just as good as on Linux.

I would say that a straight C test for this functionality would be very good to have in https://github.com/ivmai/bdwgc/tree/master/tests ---I probably ought to contribute one...

Apologies for noise regarding crashes on Linux; it appears that up to and including gc-7.6.0 they have nothing to do with GC, these are our own bugs. I will investigate the latest GC master to see what goes on there, and report in a new issue.

ivmai commented 7 years ago

It looks like this patch breaks kFreeBSD build. So, I will make a fix for the patch.

ivmai commented 7 years ago

I would say that a straight C test for this functionality would be very good to have in https://github.com/ivmai/bdwgc/tree/master/tests ---I probably ought to contribute one...

Right. You are welcome.

ivmai commented 7 years ago

I would say that a straight C test for this functionality would be very good to have in tests

Such test already exists initsecondarythread.c but it has an exclusion for FreeBSD - to be removed.

dimpase commented 7 years ago

OK, good, on FreeBSD with gc-7.6.0 patched by 49d7fe7 I tried commenting out GC_INIT() in initsecondarythread.c on https://github.com/ivmai/bdwgc/blob/master/tests/initsecondarythread.c#L79 and this test passes (and as a sanity check, without the patch from 49d7fe7 this test gives a segfault).

(I am not sure how to write a proper condition for skipping this line on FreeBSD, thus only this rudimentary check, sorry).

ivmai commented 7 years ago

Here is the fix (39ca0b5) of my previous commit (49d7fe7). Now it works for kFreeBSD. Please check 49d7fe7+39ca0b5 - if works for you then I will squash them and push to release-7_6 branch.

dimpase commented 7 years ago

shouldn't checking for pthread_np.h be done in configure.ac?

dimpase commented 7 years ago

OK, 49d7fe7+39ca0b5 work for me (on FreeBSD 11.0 and on Linux too), in the master branch as well as on top of 7.6.

ivmai commented 7 years ago

shouldn't checking for pthread_np.h be done in configure.ac?

It's up to the design. configure is optional. Historically, multiple build systems are supported and I'm trying not to break them. If possible, the features of the target system should be identified in gcconfig.h - this allows avoid duplicating feature detection in the build scripts. And, not to mention, it allows to compile bdwgc from the command line easily (e.g. gcc -I include tests/test.c extra/gc.c)