Closed GoogleCodeExporter closed 8 years ago
Can you please provide a backtrace? This definitely doesn't happen on my
platforms.
Original comment by dorma...@rydia.net
on 20 Apr 2015 at 7:00
Built with ./configure -O0 -g
FreeBSD 10.1-RELEASE-p9 #0
Memcached 1.4.23
libevent2-2.0.22_1
GDB output
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 801806800 (LWP 100454/memcached)]
0x0000000800854631 in pthread_mutex_destroy () from /lib/libthr.so.3
(gdb) bt
#0 0x0000000800854631 in pthread_mutex_destroy () from /lib/libthr.so.3
#1 0x0000000000418aef in item_stats_totals (add_stats=0x40e5a0 <append_stats>,
c=0x80195a200) at items.c:506
#2 0x00000000004145a6 in get_stats (stat_type=0x0, nkey=0, add_stats=0x40e5a0
<append_stats>, c=0x80195a200) at slabs.c:309
#3 0x0000000000412026 in process_stat (c=0x80195a200, tokens=0x7fffffbfdbc0,
ntokens=2) at memcached.c:2808
#4 0x00000000004100a8 in process_command (c=0x80195a200, command=0x801967800
"stats") at memcached.c:3491
#5 0x0000000000409c95 in try_read_command (c=0x80195a200) at memcached.c:3796
#6 0x00000000004086a8 in drive_machine (c=0x80195a200) at memcached.c:4141
#7 0x000000000040380e in event_handler (fd=26, which=2, arg=0x80195a200) at
memcached.c:4386
#8 0x0000000800c7b3be in event_base_loop () from
/usr/local/lib/libevent-2.0.so.5
#9 0x000000000041edcb in worker_libevent (arg=0x801924000) at thread.c:379
#10 0x000000080084e4f5 in pthread_create () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
(gdb) info threads
6 Thread 801806400 (LWP 100441/memcached) 0x0000000800fd204a in kevent () from /lib/libc.so.7
* 5 Thread 801806800 (LWP 100454/memcached) 0x0000000800854631 in
pthread_mutex_destroy () from /lib/libthr.so.3
4 Thread 801806c00 (LWP 100455/memcached) 0x0000000800fd204a in kevent () from /lib/libc.so.7
3 Thread 801807000 (LWP 100456/memcached) 0x0000000800fd204a in kevent () from /lib/libc.so.7
2 Thread 801807400 (LWP 100457/memcached) 0x0000000800fd204a in kevent () from /lib/libc.so.7
1 Thread 801807800 (LWP 100458/memcached) 0x00000008008598cc in __error () from /lib/libthr.so.3
(gdb) thread 1
[Switching to thread 1 (Thread 801807800 (LWP 100458/memcached))]#0
0x00000008008598cc in __error () from /lib/libthr.so.3
(gdb) bt
#0 0x00000008008598cc in __error () from /lib/libthr.so.3
#1 0x0000000800857d8e in _pthread_cond_wait () from /lib/libthr.so.3
#2 0x000000000041cd2d in assoc_maintenance_thread (arg=0x0) at assoc.c:261
#3 0x000000080084e4f5 in pthread_create () from /lib/libthr.so.3
#4 0x0000000000000000 in ?? ()
(gdb) thread 6
[Switching to thread 6 (Thread 801806400 (LWP 100441/memcached))]#0
0x0000000800fd204a in kevent () from /lib/libc.so.7
(gdb) bt
#0 0x0000000800fd204a in kevent () from /lib/libc.so.7
#1 0x0000000800c8de58 in evutil_secure_rng_add_bytes () from
/usr/local/lib/libevent-2.0.so.5
#2 0x0000000800c7afa3 in event_base_loop () from
/usr/local/lib/libevent-2.0.so.5
#3 0x00000000004067e9 in main (argc=2, argv=0x7fffffffeba0) at memcached.c:5724
(gdb)
Original comment by arne.cl...@gmail.com
on 21 Apr 2015 at 6:49
Some more info on Stack Frame 1 (item_stats_totals)
(gdb) frame 1
#1 0x0000000000418aef in item_stats_totals (add_stats=0x40e5a0 <append_stats>,
c=0x80195a200) at items.c:506
506 pthread_mutex_lock(&lru_locks[i]);
Current language: auto; currently minimal
(gdb) list
501 for (n = 0; n < MAX_NUMBER_OF_SLAB_CLASSES; n++) {
502 int x;
503 int i;
504 for (x = 0; x < 4; x++) {
505 i = n | lru_type_map[x];
506 pthread_mutex_lock(&lru_locks[i]);
507 totals.expired_unfetched += itemstats[i].expired_unfetched;
508 totals.evicted_unfetched += itemstats[i].evicted_unfetched;
509 totals.evicted += itemstats[i].evicted;
510 totals.reclaimed += itemstats[i].reclaimed;
(gdb) print i
$1 = 255
(gdb) print n
$2 = 63
(gdb) print lru_locks
$3 = {0x801818150, 0x8018181a0, 0x8018181f0, 0x801818240, 0x801818290,
0x8018182e0, 0x801818330, 0x801818380, 0x8018183d0, 0x801818420, 0x801818470,
0x8018184c0, 0x801818510, 0x801818560, 0x8018185b0, 0x801818600, 0x801818650, 0x8018186a0, 0x8018186f0, 0x801818740, 0x801818790, 0x8018187e0, 0x801818830,
0x801818880, 0x8018188d0, 0x801818920, 0x801818970, 0x8018189c0, 0x801818a10, 0x801818a60, 0x801818ab0, 0x801818b00, 0x801818b50, 0x801818ba0, 0x801818bf0,
0x801818c40, 0x801818c90, 0x801818ce0, 0x801818d30, 0x801818d80, 0x801818dd0, 0x801818e20, 0x801818e70, 0x801818ec0, 0x801818f10, 0x801818f60, 0x801818fb0,
0x8018c5060, 0x8018c50b0, 0x8018c5100, 0x8018c5150, 0x8018c51a0, 0x8018c51f0, 0x8018c5240, 0x8018c5290, 0x8018c52e0, 0x8018c5330, 0x8018c5380, 0x8018c53d0,
0x8018c5420, 0x8018c5470, 0x8018c54c0, 0x8018c5510, 0x8018c5560, 0x8018c55b0, 0x8018c5600, 0x8018c5650, 0x8018c56a0, 0x8018c56f0, 0x8018c5740, 0x8018c5790,
0x8018c57e0, 0x8018c5830, 0x8018c5880, 0x8018c58d0, 0x8018c5920, 0x8018c5970, 0x8018c59c0, 0x8018c5a10, 0x8018c5a60, 0x8018c5ab0, 0x8018c5b00, 0x8018c5b50,
0x8018c5ba0, 0x8018c5bf0, 0x8018c5c40, 0x8018c5c90, 0x8018c5ce0, 0x8018c5d30, 0x8018c5d80, 0x8018c5dd0, 0x8018c5e20, 0x8018c5e70, 0x8018c5ec0, 0x8018c5f10,
0x8018c5f60, 0x8018c5fb0, 0x8018c6060, 0x8018c60b0, 0x8018c6100, 0x8018c6150, 0x8018c61a0, 0x8018c61f0, 0x8018c6240, 0x8018c6290, 0x8018c62e0, 0x8018c6330,
0x8018c6380, 0x8018c63d0, 0x8018c6420, 0x8018c6470, 0x8018c64c0, 0x8018c6510, 0x8018c6560, 0x8018c65b0, 0x8018c6600, 0x8018c6650, 0x8018c66a0, 0x8018c66f0,
0x8018c6740, 0x8018c6790, 0x8018c67e0, 0x8018c6830, 0x8018c6880, 0x8018c68d0, 0x8018c6920, 0x8018c6970, 0x8018c69c0, 0x8018c6a10, 0x8018c6a60, 0x8018c6ab0,
0x8018c6b00, 0x8018c6b50, 0x8018c6ba0, 0x8018c6bf0, 0x8018c6c40, 0x8018c6c90, 0x8018c6ce0, 0x8018c6d30, 0x8018c6d80, 0x8018c6dd0, 0x8018c6e20, 0x8018c6e70,
0x8018c6ec0, 0x8018c6f10, 0x8018c6f60, 0x8018c6fb0, 0x8018c7060, 0x8018c70b0, 0x8018c7100, 0x8018c7150, 0x8018c71a0, 0x8018c71f0, 0x8018c7240, 0x8018c7290,
0x8018c72e0, 0x8018c7330, 0x8018c7380, 0x8018c73d0, 0x8018c7420, 0x8018c7470, 0x8018c74c0, 0x8018c7510, 0x8018c7560, 0x8018c75b0, 0x8018c7600, 0x8018c7650,
0x8018c76a0, 0x8018c76f0, 0x8018c7740, 0x8018c7790, 0x8018c77e0, 0x8018c7830, 0x8018c7880, 0x8018c78d0, 0x8018c7920, 0x8018c7970, 0x8018c79c0, 0x8018c7a10,
0x8018c7a60, 0x8018c7ab0, 0x8018c7b00, 0x8018c7b50, 0x8018c7ba0, 0x8018c7bf0, 0x8018c7c40, 0x8018c7c90, 0x8018c7ce0, 0x8018c7d30, 0x8018c7d80, 0x8018c7dd0,
0x8018c7e20, 0x8018c7e70, 0x8018c7ec0, 0x8018c7f10, 0x8018c7f60, 0x8018c7fb0, 0x8018c8060, 0x8018c80b0, 0x8018c8100...}
(gdb) print lru_locks[i]
$4 = 0xc
(gdb) print x
$5 = 3
(gdb) print lru_type_map[x]
$6 = 192
(gdb)
Original comment by arne.cl...@gmail.com
on 21 Apr 2015 at 6:58
#define POWER_LARGEST 255
pthread_mutex_t lru_locks[POWER_LARGEST];
I guess there is a +1 missing for the second line as
- MAX_NUMBER_OF_SLAB_CLASSES = 64
- n = [0..63]
- lru_type_map[3] == NOEXP_LRU == 192 == 0b11000000
- 63 = 0b00111111
=> 255 is totally reasonable
Original comment by arne.cl...@gmail.com
on 21 Apr 2015 at 7:14
items.h:27 -> extern pthread_mutex_t lru_locks[POWER_LARGEST+1];
thread.c:40 -> pthread_mutex_t lru_locks[POWER_LARGEST+1];
thread.c:728 -> for (i = 0; i <= POWER_LARGEST; i++) {
Fixes the problem for me but I did not check if it does make sense with all the
arrays depending on POWER_LARGEST.
Original comment by arne.cl...@gmail.com
on 21 Apr 2015 at 7:32
Shit. I fixed an off-by-one with MAX_SLAB_ETC as one of the last commits. Guess
that pushed the POWER_LARGEST value over the limit :(
Wonder why that doesn't crash for me. I'll take a more careful look at this as
soon as I can.
Original comment by dorma...@rydia.net
on 21 Apr 2015 at 7:46
I don't get a hard crash here, but I do get a hard lockup. This looks like a
similar situation to the comment I just made on #404- things get weird when
hardware lock elision comes into play. It works fine, no lockups, on my older
x86_64 machine.
0x00007f9c3d9b2883 in epoll_wait () from /usr/lib/libc.so.6
(gdb) thread apply all backtrace
Thread 6 (Thread 0x7f9c3d8c9700 (LWP 12250)):
#0 0x00007f9c3d9b2883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007f9c3deaf4d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007f9c3de9a61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x817480) at thread.c:379
#4 0x00007f9c3dc74374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007f9c3d9b227d in clone () from /usr/lib/libc.so.6
Thread 5 (Thread 0x7f9c3d0c8700 (LWP 12251)):
#0 0x00007f9c3dc7c64c in __lll_lock_wait () from /usr/lib/libpthread.so.0
#1 0x00007f9c3dc7f090 in __lll_lock_elision () from /usr/lib/libpthread.so.0
#2 0x00000000004183c9 in item_stats_totals (add_stats=add_stats@entry=0x4067d0
<append_stats>, c=c@entry=0x7f9c34026200) at items.c:506
#3 0x0000000000414c57 in get_stats (stat_type=stat_type@entry=0x0,
nkey=nkey@entry=0, add_stats=add_stats@entry=0x4067d0 <append_stats>,
c=c@entry=0x7f9c34026200)
at slabs.c:309
#4 0x000000000040ef1d in process_stat (ntokens=<optimized out>,
tokens=0x7f9c3d0c7cd0, c=0x7f9c34026200) at memcached.c:2808
#5 process_command (c=c@entry=0x7f9c34026200, command=<optimized out>) at
memcached.c:3491
#6 0x000000000040fc39 in try_read_command (c=0x7f9c34026200) at
memcached.c:3796
#7 drive_machine (c=0x7f9c34026200) at memcached.c:4141
#8 event_handler (fd=<optimized out>, which=<optimized out>,
arg=0x7f9c34026200) at memcached.c:4386
#9 0x00007f9c3de9aca6 in event_base_loop () from /usr/lib/libevent-2.0.so.5
#10 0x000000000041add4 in worker_libevent (arg=0x8185c8) at thread.c:379
#11 0x00007f9c3dc74374 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007f9c3d9b227d in clone () from /usr/lib/libc.so.6
Thread 4 (Thread 0x7f9c3c8c7700 (LWP 12252)):
#0 0x00007f9c3d9b2883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007f9c3deaf4d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007f9c3de9a61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x819710) at thread.c:379
#4 0x00007f9c3dc74374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007f9c3d9b227d in clone () from /usr/lib/libc.so.6
Thread 3 (Thread 0x7f9c3c0c6700 (LWP 12253)):
#0 0x00007f9c3d9b2883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007f9c3deaf4d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007f9c3de9a61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x81a858) at thread.c:379
#4 0x00007f9c3dc74374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007f9c3d9b227d in clone () from /usr/lib/libc.so.6
Thread 2 (Thread 0x7f9c3b8c5700 (LWP 12254)):
#0 0x00007f9c3dc799af in pthread_cond_wait@@GLIBC_2.3.2 () from
/usr/lib/libpthread.so.0
#1 0x000000000041a510 in assoc_maintenance_thread (arg=<optimized out>) at
assoc.c:261
#2 0x00007f9c3dc74374 in start_thread () from /usr/lib/libpthread.so.0
#3 0x00007f9c3d9b227d in clone () from /usr/lib/libc.so.6
Thread 1 (Thread 0x7f9c3e2de700 (LWP 12249)):
#0 0x00007f9c3d9b2883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007f9c3deaf4d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007f9c3de9a61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x0000000000404eea in main (argc=0, argv=0x0) at memcached.c:5724
works:
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm
tpr_shadow vnmi flexpriority
broken:
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x12
cpu MHz : 3874.539
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lahf_lm abm ida arat pln pts dtherm tpr_shadow vnmi
flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid
rtm xsaveopt
Original comment by dpmc...@gmail.com
on 23 Apr 2015 at 2:01
I suppose I forgot to say that the above GDB backtrace was captured after
running the steps to reproduce. The telnet session returned no response to the
stats command for me, and then I captured the above backtrace.
Original comment by dpmc...@gmail.com
on 23 Apr 2015 at 2:11
Seems like everything using LARGEST_ID and POWER_LARGEST were comparing via "<"
already, so upping LARGEST_ID by one should cover all of the other off-by-one
cases I introduced earlier.
Can you folks please try https://github.com/memcached/memcached (master branch)
and ping this issue on if it repairs your problem completey? All tests should
run.
Original comment by dorma...@rydia.net
on 24 Apr 2015 at 7:05
Does not compile with clang:
items.c:274:28: error: comparison of constant 256 with expression of type
'uint8_t' (aka 'unsigned char') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
assert(it->slabs_clsid < LARGEST_ID);
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
/usr/include/assert.h:54:21: note: expanded from macro 'assert'
#define assert(e) ((e) ? (void)0 : __assert(__func__, __FILE__, \
^
items.c:298:28: error: comparison of constant 256 with expression of type
'uint8_t' (aka 'unsigned char') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
assert(it->slabs_clsid < LARGEST_ID);
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
/usr/include/assert.h:54:21: note: expanded from macro 'assert'
#define assert(e) ((e) ? (void)0 : __assert(__func__, __FILE__, \
^
items.c:1097:28: error: comparison of constant 256 with expression of type
'uint8_t' (aka 'unsigned char') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
assert(it->slabs_clsid < LARGEST_ID);
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
/usr/include/assert.h:54:21: note: expanded from macro 'assert'
#define assert(e) ((e) ? (void)0 : __assert(__func__, __FILE__, \
^
items.c:1119:28: error: comparison of constant 256 with expression of type
'uint8_t' (aka 'unsigned char') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
assert(it->slabs_clsid < LARGEST_ID);
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
/usr/include/assert.h:54:21: note: expanded from macro 'assert'
#define assert(e) ((e) ? (void)0 : __assert(__func__, __FILE__, \
^
items.c:1145:28: error: comparison of constant 256 with expression of type
'uint8_t' (aka 'unsigned char') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
assert(it->slabs_clsid < LARGEST_ID);
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
Original comment by arne.cl...@gmail.com
on 24 Apr 2015 at 7:20
seriously, clang? I have to go swap all of the <'s and <='s? :/
Well if you're still reading and aren't building with clang I'd like to know if
this fixes the issue still.
Original comment by dorma...@rydia.net
on 24 Apr 2015 at 7:31
This fixes it for me, all tests now pass. Great stuff, I shall go ahead and
test this release hard on a real machine now.
Original comment by markhag...@gmail.com
on 24 Apr 2015 at 8:36
Well it does have a point because that assert does not really make much sense
as a byte is always < 256.
Original comment by arne.cl...@gmail.com
on 24 Apr 2015 at 8:37
Removed the asserts -> compiles -> stats do now work again.
Original comment by arne.cl...@gmail.com
on 24 Apr 2015 at 8:39
Stats works for me now as well on the box that was unresponsive, once the patch
is applied. Thanks!
Original comment by dpmc...@gmail.com
on 24 Apr 2015 at 2:47
1.4.24 is out, please try if you haven't already
Original comment by dorma...@rydia.net
on 25 Apr 2015 at 9:01
Original issue reported on code.google.com by
arne.cl...@gmail.com
on 20 Apr 2015 at 8:07