ivmai / bdwgc

The Boehm-Demers-Weiser conservative C/C++ Garbage Collector (bdwgc, also known as bdw-gc, boehm-gc, libgc)
https://www.hboehm.info/gc/
Other
2.98k stars 407 forks source link

Assertion failure in disclaim_test if manual VDB #235

Closed ivmai closed 6 years ago

ivmai commented 6 years ago

Build link: https://travis-ci.org/ivmai/bdwgc/jobs/425865062 Source: master (or 8.0.0)

Host: Linux/x64 How to build and run: ./configure --enable-gc-assertions --disable-munmap && make check CFLAGS_EXTRA="-D TEST_MANUAL_VDB"

Output (of disclaim_test): Threaded disclaim test. Assertion failure, line 119: !p->car || is_pair(p->car)

ivmai commented 6 years ago

If compiled with DEBUG_DISCLAIM_DESTRUCT, the output as follows ("car" is destroyed before "p") :

Threaded disclaim test. ... Construct 0xe88d58 = ((nil), (nil)) ... Construct 0xe8c248 = ((nil), (nil)) ... Construct 0xe8ec08 = (0xe8c248, 0xe88d58) ... Destruct 0xe8c248 = ((nil), (nil)) ... Destruct 0xe8ec08 = (0xe8c248, 0xe88d58) Assertion failure, line 119: !p->car || is_pair(p->car)

paurkedal commented 6 years ago

I cannot reproduce this myself from cac8158 with the given options on Ubuntu 18.04, but: A comment in mark.c gives me the suspicion that I might have failed to follow the logic of the GC cycles and that the following might be needed:

diff --git a/mark.c b/mark.c
index b1d59026..a7951374 100644
--- a/mark.c
+++ b/mark.c
@@ -1969,6 +1969,11 @@ STATIC struct hblk * GC_push_next_marked(struct hblk *h)
         if (NULL == h) ABORT("Bad HDR() definition");
 #     endif
     }
+#   ifdef ENABLE_DISCLAIM
+      if ((hhdr -> hb_flags & MARK_UNCONDITIONALLY) != 0) {
+        GC_push_unconditionally(h, hhdr);
+      } else
+#   endif
     GC_push_marked(h, hhdr);
     return(h + OBJ_SZ_TO_BLOCKS(hhdr -> hb_sz));
 }

It's a wild guess though.

ivmai commented 6 years ago

I cannot reproduce this myself

To reproduce it, I run ./disclaim_test in a loop in 3 terminals in parallel (and got my_assert violation after around 100 iterations.

ivmai commented 6 years ago

I think the assertion was because I resurrected the object by GC_ptr_store_and_dirty(&p->car, cd) in pair_dct. Now I've removed the store barrier which was not actually needed because we reclaim p after pair_dct completion (now the code is p->car = 0).

ivmai commented 6 years ago

After the change (GC_ptr_store_and_dirty -> p->car=0), I got the following error: Assertion failure, line 124: p->checksum == checksum

(It is reproduced after several dozens of disclaim_test runs even in a more simple configuration: ./configure --enable-gc-assertions --disable-parallel-mark --disable-munmap --disable-handle-fork --disable-thread-local-alloc && make check CFLAGS_EXTRA="-O0 -g -D TEST_MANUAL_VDB")

ivmai commented 6 years ago

And, again, sometimes I see: Assertion failure, line 120: !p->cdr || is_pair(p->cdr)

p is valid but p->cdr is already reclaimed (zero'ed, the 1st word points to the next element in the free list.)

ivmai commented 6 years ago

Not reproduced w/o -D TEST_MANUAL_VDB

ivmai commented 6 years ago

A comment in mark.c gives me the suspicion that I might have failed to follow the logic of the GC cycles and that the following might be needed

I will check it tomorrow.

paurkedal commented 6 years ago

I managed to trigger the error with 3 parallel runs as you suggested. My adjustments to mark.c did not make a difference, though.

paurkedal commented 6 years ago

I think the issue is that GC_ptr_store_and_dirty calls set_pht_entry_from_index without locking. I believe the latter (|=) translates to separate read and write, so if another thread does the same between the two operations on the same word, that thread's dirty bit will be lost.

I'm running some long test now with the following:

diff --git a/include/private/gc_priv.h b/include/private/gc_priv.h
index 4f2d3085..4d2fd7a0 100644
--- a/include/private/gc_priv.h
+++ b/include/private/gc_priv.h
@@ -2197,7 +2197,8 @@ GC_EXTERN GC_bool GC_print_back_height;
 # define GC_auto_incremental (GC_incremental && !GC_manual_vdb)

   GC_INNER void GC_dirty_inner(const void *p); /* does not require locking */
-# define GC_dirty(p) (GC_manual_vdb ? GC_dirty_inner(p) : (void)0)
+  GC_API_PRIV void GC_dirty_async(const void *p);
+# define GC_dirty(p) (GC_manual_vdb ? GC_dirty_async(p) : (void)0)
 # define REACHABLE_AFTER_DIRTY(p) GC_reachable_here(p)
 #endif /* !GC_DISABLE_INCREMENTAL */

diff --git a/os_dep.c b/os_dep.c
index 03cd7ecb..efbf8708 100644
--- a/os_dep.c
+++ b/os_dep.c
@@ -3723,6 +3723,18 @@ GC_INNER GC_bool GC_dirty_init(void)
     set_pht_entry_from_index(GC_dirty_pages, index); /* FIXME: concurrent */
   }

+  GC_API_PRIV void GC_dirty_async(const void *p)
+  {
+    word index = PHT_HASH(p);
+
+#   if defined(MPROTECT_VDB)
+      /* Do not update GC_dirty_pages if it should be followed by the   */
+      /* page unprotection.                                             */
+      GC_ASSERT(GC_manual_vdb);
+#   endif
+    async_set_pht_entry_from_index(GC_dirty_pages, index); /* FIXME: concurrent */
+  }
+
   /* Retrieve system dirty bits for the heap to a local buffer (unless  */
   /* output_unneeded).  Restore the systems notion of which pages are   */
   /* dirty.  We assume that either the world is stopped or it is OK to  */

As you can see there is a FIXME which made me look bit closer.

ivmai commented 6 years ago

I see. I will check it as well but I'm not sure async_set_pht_entry_from_index has enough thread-safety for GC_dirty. It seems to me that we need AO_or here.

To @hboehm: what do you think?

paurkedal commented 6 years ago

The issue hasn't popped up after the change, but it also eventually deadlocks, so there is probably a double locking involved. AO_or should avoid that, so I'll try it.

ivmai commented 6 years ago

Yes, async_set_pht_entry_from_index should fix the race except there could be a deadlock if it is called also from the write fault handler but GC_dirty_inner and write fault handler usage should be mutually exclusive - if manual VDB is used then MPROTECT_VDB functionality is off, and vice versa. So, I don't understand the reason of the deadlock. On Monday, I will try to reproduce this deadlock.

ivmai commented 6 years ago

AO_or is better but not always available.

ivmai commented 6 years ago

but it also eventually deadlocks, so there is probably a double locking involved.

Yes, this happens if thread is suspended in the middle of async_set_pht_entry_from_index.

ivmai commented 6 years ago

Fixed by 0c0e4cd.