nicolasff / webdis

A Redis HTTP interface with JSON output
https://webd.is
BSD 2-Clause "Simplified" License
2.82k stars 305 forks source link

Unexpected Segmentation fault #53

Closed yyyar closed 11 years ago

yyyar commented 11 years ago

Running webdis with ~100-150 connections. HBGET, MGET, GET, SUBSCRIBE, SMEMBERS, RPUSH commands are used.

Time-to-time, approximately once a day we have webdis crash. Here a gdb backtrace:

Program terminated with signal 11, Segmentation fault. (gdb) bt

0 0x00000000 in ?? ()

1 0xad445f70 in ?? ()

2 0x08058f44 in redisProcessCallbacks (ac=0xad445f70) at hiredis/async.c:403

3 0x08059109 in redisAsyncHandleRead (ac=0xad445f70) at hiredis/async.c:467

4 0x0804fbe0 in redisLibeventReadEvent (fd=378, event=2, arg=0xad45fb98) at ./hiredis/adapters/libevent.h:15

5 0xb773fce9 in event_base_loop () from /usr/lib/libevent-2.0.so.5

6 0xb7740a83 in event_base_dispatch () from /usr/lib/libevent-2.0.so.5

7 0x0804a656 in worker_main (p=0x97291f0) at worker.c:156

8 0xb771ed4c in start_thread () from /lib/i386-linux-gnu/tls/i686/nosegneg/libpthread.so.0

9 0xb765dace in clone () from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6

Starting webdis with ulimit -c 60000 "http_port": 80, "threads": 20, "pool_size": 5,

Redis 2.4.7 is on another machine.

Thank you in advance.

nicolasff commented 11 years ago

Hello Yaroslav,

Thanks for reporting this issue; it looks like webdis makes some assumptions about hiredis. I will have a look and try to reproduce it locally.

Nicolas

marcoPierleoni commented 11 years ago

Hello Nicolas,

I am not an expert on the webdis code, so I might have overlooked something, but I was wondering if it can be a race condition.

Redis drops connections if they are idle, when this happens hiredis calls the disconnect callback (pool_on_disconnect) and webdis removes that connection from the pool, then hiredis releases the redisContext. I do not see any locks on the pool of connections, so it can happen then when a connection is been dropped and the context freed, webdis can assign the same connection to a new HTTP request, which it will end up using a connection with a freed redisContext.

nicolasff commented 11 years ago

@marcoPierleoni thanks, I will look into it. There are no locks since webdis has a connection pool per worker thread, but there could still be a race among several clients.

I also discovered yesterday that there is a known issue with SUBSCRIBE in hiredis, but I'm not yet sure if it is related. I modified Redis to throw errors on SUBSCRIBE but that did not kill webdis.

Still investigating...

marcoPierleoni commented 11 years ago

I reproduced a crash.

I set redis with: maxclients 103

Then I started webdis with 100 connection. The first two calls to /SUBSCRIBE/channel are OK. With a third one webdis was killed.

If I remove -O3 in the compilation, webdis it is not killed anymore on the third SUBSCRIBE

nicolasff commented 11 years ago

Thanks a lot Marco, I'll try this scenario.

On 25 Jul 2012, at 23:49, Marco Pierleonireply@reply.github.com wrote:

I reproduced a crash.

I set redis with: maxclients 103

Then I started webdis with 100 connection. The first two calls to /SUBSCRIBE/channel are OK. With a third one webdis was killed.

If I remove -O3 in the compilation, webdis it is not killed anymore on the third SUBSCRIBE


Reply to this email directly or view it on GitHub: https://github.com/nicolasff/webdis/issues/53#issuecomment-7264688

nicolasff commented 11 years ago

Unfortunately it does look like the issue reported by @antirez in hiredis.

(gdb) bt
#0  0x0000000000413938 in __redisGetSubscribeCallback (ac=0x68ddc0, reply=0x6, dstcb=0x7ffff73fad00) at hiredis/async.c:324
#1  0x0000000000413c84 in redisProcessCallbacks (ac=0x68ddc0) at hiredis/async.c:399
#2  0x0000000000413ea0 in redisAsyncHandleRead (ac=0x68ddc0) at hiredis/async.c:467
#3  0x0000000000409248 in redisLibeventReadEvent (fd=643, event=2, arg=0x68def0) at ./hiredis/adapters/libevent.h:15
#4  0x00007ffff7bc7f53 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#5  0x0000000000403261 in worker_main (p=0x628f30) at worker.c:156
#6  0x00007ffff79acefc in start_thread (arg=0x7ffff73fb700) at pthread_create.c:304
#7  0x00007ffff76e759d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8  0x0000000000000000 in ?? ()
(gdb) f 0
#0  0x0000000000413938 in __redisGetSubscribeCallback (ac=0x68ddc0, reply=0x6, dstcb=0x7ffff73fad00) at hiredis/async.c:324
324     if (reply->type == REDIS_REPLY_ARRAY) {

Hiredis fails to parse the reply object and leaves it broken (pointing to 0x6 in this case), after which dereferencing it crashes. A temporary solution is to bump the number of connections allowed by Redis, since webdis creates a new connection for every call to SUBSCRIBE, and does not reuse the pool connection for those clients.

My sincere apologies for this issue and the lack of an immediate fix, I will try to see if there is a workaround or if the fix in hiredis isn't too intrusive.

nicolasff commented 11 years ago

As described on the hiredis ticket, this should be fixed. I have updated webdis to use the latest version of hiredis and pushed it back to GitHub.

Thanks for reporting this problem!