redis / hiredis

Minimalistic C client for Redis >= 1.2
BSD 3-Clause "New" or "Revised" License
6.15k stars 1.8k forks source link

Program crashes due to Segmentation fault or double free using AsyncCommand (PUBLISH..) #1258

Closed BinduRao2018 closed 2 months ago

BinduRao2018 commented 2 months ago

System (Ubuntu22.04) - using libhiredis-dev with libevent-dev provided within the distribution (i.e hiredis -0.14 & libevent-2.1)

Here is the code snippet:

callback context redis_publish_reply() { free (str1); free (str2); }

thread1: { ... malloc (str1); malloc (str2); redisAsyncCommand (ctx, redis_publish_reply, privdata, "PUBLISH %s %s", str1, str2); }

where str1 and str2 are allocated in thread1 and freed in the callback context.

michael-grunder commented 2 months ago

Hi,

The version of Hiredis you're using, v0.14, is quite out of date compared to the current stable version, which is v1.2.0.

I'd need a bit more detail about your code to be of any help. Can you provide a minimally reproducable example (a complete code listing I can compile to reproduce the bug)? If not a full call-stack could help.

It's important to note that Hiredis isn't thread-safe. So, if you're sharing a redisContext across multiple threads without proper synchronization you will encounter undefined behavior.

I'm happy to take a look if you can provide a compilable example

BinduRao2018 commented 2 months ago

Thanks @michael-grunder for your quick response. I am using mutex to access the context across different threads. I will build an example code to reproduce the issue and share it.

We are using the old version as it is the only available version from the Ubuntu 22.04 distribution. Is the latest version(v1.2.0) compatible with libevent available across various linux distributions?

michael-grunder commented 2 months ago

We are using the old version as it is the only available version from the Ubuntu 22.04 distribution. Is the latest version(v1.2.0) compatible with libevent available across various linux distributions

It should be compatible with any of the event libraries we have adapters for.

I just checked and it wasn't until ubuntu 24.04 that they updated from v0.14 to v1.1.0.

BinduRao2018 commented 2 months ago

Based on your information, I have integrated our software with hiredis - 1.1.0. I see a crash with corrupted size vs. prev_size while consolidating - Process finished with exit code 134 (interrupted by signal 6: SIGABRT) Here is a call stack:

Thread 8 "xrt" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff3ff9640 (LWP 28517)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737287001664) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737287001664)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737287001664) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737287001664, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7c89676 in __libc_message (action=action@entry=do_abort, 
    fmt=fmt@entry=0x7ffff7ddbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7ca0cfc in malloc_printerr (
    str=str@entry=0x7ffff7dde800 "corrupted size vs. prev_size while consolidating")
    at ./malloc/malloc.c:5664
#7  0x00007ffff7ca2f22 in _int_free (av=0x7fffdc000030, p=0x7fffdc01ba50, have_lock=<optimised out>)
    at ./malloc/malloc.c:4606
#8  0x00007ffff7ca5453 in __GI___libc_free (mem=<optimised out>) at ./malloc/malloc.c:3391
#9  0x00007ffff7e64b0d in redisBufferWrite ()
   from /home/bindurao/workbench/hiredis/cmake-build-release/libhiredis.so.1.1.0
#10 0x00007ffff7e61c3e in redisAsyncWrite ()
   from /home/bindurao/workbench/hiredis/cmake-build-release/libhiredis.so.1.1.0
#11 0x00007ffff7f82e16 in redisLibeventHandler (fd=4, event=4, arg=0x7fffe8001f90)
    at /home/bindurao/workbench/hiredis/adapters/libevent.h:74
#12 0x00007ffff7ae4f58 in ?? () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#13 0x00007ffff7ae68a7 in event_base_loop () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#14 0x00007ffff7f84f31 in xrt_redis_event_thread (arg=0x5555555dc9f0)
    at /home/bindurao/workbench/v2.1-branch/xrt/src/c/bridge/redis/redis_bridge.c:528
#15 0x00007ffff7c94ac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#16 0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
michael-grunder commented 2 months ago

Hey,

I can see that there is a memory corruption but there's no reason to believe hiredis is causing the corruption. My money would be on improperly sharing of the redisAsyncContext across multiple threads.

It is useful to run your program under valgrind and see what errors it reports. It will catch any memory read/write violations in real time and help narrow down the problem. You can also try using the rr debugger which allows you to record a run of your application and then replay it as may times as you like.

I'm curious why you're using threads at all. Sometimes it's worthwhile to use an async event library and multiple threads but often you just need one thread with multiple callbacks.

You mentioned using a mutex to control access to the redisAsyncContext struct. It's important to note that simply wrapping access to the struct behind a mutex is likely insufficient and probably not what you even want.

I recently wrote a little quick-and-dirty toy program that uses async hiredis to bounce messages back and forth between two messaging channels. I'm not sure if this is at all similar to what you're doing but you can see I don't use threads. I use two redisAsyncContext structs attached to different parts of the event loop.

michael-grunder commented 2 months ago

I'm going to close this issue since it's unlikely a bug in hiredis.

If you can provide a reproducer that I can run though, I'm happy to take a deeper look.