usmanm / redis-tdigest

t-digest module for Redis
http://redismodules.com/modules/redis-tdigest/
MIT License
73 stars 12 forks source link

the module got coredump #7

Open vvhungy opened 6 years ago

vvhungy commented 6 years ago

Our redis instance running with tdigest module got coredump in every 4-5 days. gdb provides backtrace as below. Could you please take a look at it? Tks.

(gdb) bt

0 0x00007f015ab41495 in raise () from /lib64/libc.so.6

1 0x00007f015ab42c75 in abort () from /lib64/libc.so.6

2 0x00007f015ab7f3a7 in __libc_message () from /lib64/libc.so.6

3 0x00007f015ab84dee in malloc_printerr () from /lib64/libc.so.6

4 0x00007f015ab87c80 in _int_free () from /lib64/libc.so.6

5 0x00007f0151ffa3f5 in tdigestCompress (t=0x7f0130086750) at src/tdigest.c:176

6 0x00007f0151ff953c in TDigestTypeAdd_RedisCommand (ctx=0x7ffccde4c350, argv=, argc=) at src/command.c:110

7 0x0000000000490c90 in RedisModuleCommandDispatcher (c=0x7f0150a2be40) at module.c:466

8 0x0000000000429337 in call (c=0x7f0150a2be40, flags=15) at server.c:2224

9 0x00000000004299a5 in processCommand (c=0x7f0150a2be40) at server.c:2505

10 0x0000000000439b2d in processInputBuffer (c=0x7f0150a2be40) at networking.c:1330

11 0x0000000000424aed in aeProcessEvents (eventLoop=0x7f015463b0a0, flags=11) at ae.c:421

12 0x0000000000424e0b in aeMain (eventLoop=0x7f015463b0a0) at ae.c:464

13 0x000000000042da22 in main (argc=, argv=0x7ffccde4c648) at server.c:3885

(gdb) bt full

0 0x00007f015ab41495 in raise () from /lib64/libc.so.6

No symbol table info available.

1 0x00007f015ab42c75 in abort () from /lib64/libc.so.6

No symbol table info available.

2 0x00007f015ab7f3a7 in __libc_message () from /lib64/libc.so.6

No symbol table info available.

3 0x00007f015ab84dee in malloc_printerr () from /lib64/libc.so.6

No symbol table info available.

4 0x00007f015ab87c80 in _int_free () from /lib64/libc.so.6

No symbol table info available.

5 0x00007f0151ffa3f5 in tdigestCompress (t=0x7f0130086750) at src/tdigest.c:176

    unmerged_centroids = 0x1f20a30
    unmerged_weight = <value optimized out>
    num_unmerged = <value optimized out>
    old_num_centroids = 630
    i = <value optimized out>
    j = 630
    args = {t = 0x7f0130086750, centroids = 0x1f21200, idx = 630, weight_so_far = 393204, k1 = 399.2966162507218, min = 0.14999999999999999, max = 511.43741628850984}

6 0x00007f0151ff953c in TDigestTypeAdd_RedisCommand (ctx=0x7ffccde4c350, argv=, argc=) at src/command.c:110

    key = 0x7f0154623000
    type = 6
    num_added = 1
    values = 0x7f0130bc6810
    counts = 0x7f0130bc6818
    i = <value optimized out>
    t = 0x7f0130086750
    total_count = <value optimized out>

7 0x0000000000490c90 in RedisModuleCommandDispatcher (c=0x7f0150a2be40) at module.c:466

    cp = <value optimized out>
    ctx = {getapifuncptr = 0x491320, module = 0x7f015461b0c0, client = 0x7f0150a2be40, blocked_client = 0x0, amqueue = 0x7f013e59c300, amqueue_len = 16, amqueue_used = 1, flags = 2, 
      postponed_arrays = 0x0, postponed_arrays_count = 0, blocked_privdata = 0x0, keys_pos = 0x0, keys_count = 0, pa_head = 0x7f0130bc6800}

8 0x0000000000429337 in call (c=0x7f0150a2be40, flags=15) at server.c:2224

    dirty = 23916351
    start = 1517577099955319
    duration = <value optimized out>
    client_old_flags = 0
    prev_also_propagate = {ops = 0x0, numops = 0}

9 0x00000000004299a5 in processCommand (c=0x7f0150a2be40) at server.c:2505

No locals.

10 0x0000000000439b2d in processInputBuffer (c=0x7f0150a2be40) at networking.c:1330

No locals.

11 0x0000000000424aed in aeProcessEvents (eventLoop=0x7f015463b0a0, flags=11) at ae.c:421

    fe = 0x7f01542024a0
    mask = 1
    fd = 293
    rfired = 1
    j = <value optimized out>
    shortest = <value optimized out>
    tvp = <value optimized out>
    processed = <value optimized out>
    numevents = 1

12 0x0000000000424e0b in aeMain (eventLoop=0x7f015463b0a0) at ae.c:464

No locals.

13 0x000000000042da22 in main (argc=, argv=0x7ffccde4c648) at server.c:3885

    tv = {tv_sec = 1517061481, tv_usec = 930087}
    j = <value optimized out>
    hashseed = "1a86649ef203608a"
    background = <value optimized out>
usmanm commented 6 years ago

Would it be possible for you to share a script that reproduces the issue? Or share the core dump?

vvhungy commented 6 years ago

Seem its a race-condition bug. Core-dump file's around 640M so I put on mega.nz, the link to download CORE-DUMP

The binary was compiled from my forked source-code (I added a TDIGEST.CENTROIDS myself). You can find at: https://github.com/vvhungy/redis-tdigest.

Some more info:

Server

redis_version:4.0.6 redis_build_id:8ff1ddc2d25bbf03 redis_mode:standalone os:Linux 2.6.32-696.16.1.el6.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:sync-builtin gcc_version:4.4.7

usmanm commented 6 years ago

Great, thanks so much! I'll dig into it sometime this week.

As a side note, I'm curios why you had to implement the tdigest.centroids command. Was tdigest.debug not good for your use case?

vvhungy commented 6 years ago

ah yes, the tdigest.debug result not works well with phpredis rawCommand then I should create tdigest.centroids to reformat the result, and also the function name's not good with my usage :)

usmanm commented 6 years ago

Can you also share your compiled binaries?

vvhungy commented 6 years ago

Binary file redis-server and tdigest.so (running on CentOS 6.9 x86_64), download HERE

usmanm commented 6 years ago

I'm on Ubuntu, so seems like I can't attach to the core dump.

usmanm@usmanm-puget:~/Downloads $ gdb ./redis-server core.18554 
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./redis-server...done.

warning: exec file is newer than core file.
[New LWP 18554]
[New LWP 18556]
[New LWP 18557]
[New LWP 18558]

warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for 6 libraries, e.g. /lib64/libm.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `/abserver/redis/redis-server 0.0.0.0:6392                   '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f015ab41495 in ?? ()
[Current thread is 1 (LWP 18554)]

I wonder if it'll work in a CentOS VM?

vvhungy commented 6 years ago

I think it should works on a CentOS VM, make sure you use CentOS 6.9.

vvhungy commented 6 years ago

Hi @usmanm, any progress on this?

usmanm commented 6 years ago

Hey @vvhungy, sorry I have not been to spend time on this. I will try to look into it this weekend.

vvhungy commented 6 years ago

yes, hope you can fix this soon.

rkarthick commented 2 years ago

Is the error before the core dump free(): invalid next size (normal)?

vvhungy commented 2 years ago

@rkarthick not sure where to get the coredump message. But looking at coredump backtrace, the line which cause core-dump is at src/tdigest.c:176 (pls see my first comment).

rkarthick commented 2 years ago

Looks like the issue was with the redis version 6.2.5. Downgrading the redis version to 6.0.8 fixed it for us.

On Thu, May 5, 2022 at 5:29 PM vvhungy @.***> wrote:

@rkarthick https://github.com/rkarthick not sure where to get the coredump message. But looking at coredump backtrace, the line which cause core-dump is at src/tdigest.c:176 (pls see my first comment).

— Reply to this email directly, view it on GitHub https://github.com/usmanm/redis-tdigest/issues/7#issuecomment-1119161506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH4RGRJNWPI2KC6AHKLBULVIRRWBANCNFSM4EPDOQCQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Karthick Ramachandran