Closed GoogleCodeExporter closed 8 years ago
Did this cause a bug or is this from an audit?
That little bit of code in the assoc thing was really awkward and feels wrong
to me, but that's where it ended.
The unlock is a bit of a bootstrapping issue that might go away if it's
restructured a little to use a "once on startup" flag. It unlocks right before
the thread sits on a condition, because when it wakes up from the condition the
first thing it does is lock the slab mover.
So if you add a pause before that resume it'll just deadlock when the expander
actually runs the first time.
If it's causing some bug we'll see about fixing it sooner than later, though?
Original comment by dorma...@rydia.net
on 3 Jul 2013 at 8:36
It caused a bug, but not on Linux (where the spurious pthread_mutex_unlock()
seems to be silently ignored) but rather a new operating system, whose
pthread_mutex_unlock() croaks if called on an already-unlocked mutex.
What if we add a call to pause in the *beginning* of the
assoc_maintenance_thread() function (not right before the resume) - wouldn't
this be the right fix? Then, any time the assoc_maintenance_thread() function
is working, the rebalancer is paused - except during the wait where the
rebalancer is explicitly resumed.
Original comment by nyha...@gmail.com
on 3 Jul 2013 at 8:57
For the record, here is a one-line patch I've been using to fix this bug, and
it seems to be working well:
--- src/assoc.c 2013-11-18 13:00:26.080026394 +0200
+++ src-save/assoc.c 2013-07-03 11:27:41.113465014 +0300
@@ -203,6 +203,7 @@
static void *assoc_maintenance_thread(void *arg) {
+ slabs_rebalancer_pause();
while (do_run_maintenance_thread) {
int ii = 0;
Original comment by n...@cloudius-systems.com
on 13 May 2014 at 5:50
fixed in 'next'.. no longer calls resume without pausing first.
Original comment by dorma...@rydia.net
on 28 Dec 2014 at 3:18
Original issue reported on code.google.com by
nyha...@gmail.com
on 3 Jul 2013 at 8:27