Open vvhungy opened 6 years ago
Would it be possible for you to share a script that reproduces the issue? Or share the core dump?
Seem its a race-condition bug. Core-dump file's around 640M so I put on mega.nz, the link to download CORE-DUMP
The binary was compiled from my forked source-code (I added a TDIGEST.CENTROIDS myself). You can find at: https://github.com/vvhungy/redis-tdigest.
Some more info:
redis_version:4.0.6 redis_build_id:8ff1ddc2d25bbf03 redis_mode:standalone os:Linux 2.6.32-696.16.1.el6.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:sync-builtin gcc_version:4.4.7
Great, thanks so much! I'll dig into it sometime this week.
As a side note, I'm curios why you had to implement the tdigest.centroids
command. Was tdigest.debug
not good for your use case?
ah yes, the tdigest.debug result not works well with phpredis rawCommand then I should create tdigest.centroids to reformat the result, and also the function name's not good with my usage :)
Can you also share your compiled binaries?
Binary file redis-server and tdigest.so (running on CentOS 6.9 x86_64), download HERE
I'm on Ubuntu, so seems like I can't attach to the core dump.
usmanm@usmanm-puget:~/Downloads $ gdb ./redis-server core.18554
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./redis-server...done.
warning: exec file is newer than core file.
[New LWP 18554]
[New LWP 18556]
[New LWP 18557]
[New LWP 18558]
warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for 6 libraries, e.g. /lib64/libm.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `/abserver/redis/redis-server 0.0.0.0:6392 '.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f015ab41495 in ?? ()
[Current thread is 1 (LWP 18554)]
I wonder if it'll work in a CentOS VM?
I think it should works on a CentOS VM, make sure you use CentOS 6.9.
Hi @usmanm, any progress on this?
Hey @vvhungy, sorry I have not been to spend time on this. I will try to look into it this weekend.
yes, hope you can fix this soon.
Is the error before the core dump free(): invalid next size (normal)
?
@rkarthick not sure where to get the coredump message. But looking at coredump backtrace, the line which cause core-dump is at src/tdigest.c:176 (pls see my first comment).
Looks like the issue was with the redis version 6.2.5. Downgrading the redis version to 6.0.8 fixed it for us.
On Thu, May 5, 2022 at 5:29 PM vvhungy @.***> wrote:
@rkarthick https://github.com/rkarthick not sure where to get the coredump message. But looking at coredump backtrace, the line which cause core-dump is at src/tdigest.c:176 (pls see my first comment).
— Reply to this email directly, view it on GitHub https://github.com/usmanm/redis-tdigest/issues/7#issuecomment-1119161506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH4RGRJNWPI2KC6AHKLBULVIRRWBANCNFSM4EPDOQCQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Karthick Ramachandran
Our redis instance running with tdigest module got coredump in every 4-5 days. gdb provides backtrace as below. Could you please take a look at it? Tks.
(gdb) bt
0 0x00007f015ab41495 in raise () from /lib64/libc.so.6
1 0x00007f015ab42c75 in abort () from /lib64/libc.so.6
2 0x00007f015ab7f3a7 in __libc_message () from /lib64/libc.so.6
3 0x00007f015ab84dee in malloc_printerr () from /lib64/libc.so.6
4 0x00007f015ab87c80 in _int_free () from /lib64/libc.so.6
5 0x00007f0151ffa3f5 in tdigestCompress (t=0x7f0130086750) at src/tdigest.c:176
6 0x00007f0151ff953c in TDigestTypeAdd_RedisCommand (ctx=0x7ffccde4c350, argv=, argc=) at src/command.c:110
7 0x0000000000490c90 in RedisModuleCommandDispatcher (c=0x7f0150a2be40) at module.c:466
8 0x0000000000429337 in call (c=0x7f0150a2be40, flags=15) at server.c:2224
9 0x00000000004299a5 in processCommand (c=0x7f0150a2be40) at server.c:2505
10 0x0000000000439b2d in processInputBuffer (c=0x7f0150a2be40) at networking.c:1330
11 0x0000000000424aed in aeProcessEvents (eventLoop=0x7f015463b0a0, flags=11) at ae.c:421
12 0x0000000000424e0b in aeMain (eventLoop=0x7f015463b0a0) at ae.c:464
13 0x000000000042da22 in main (argc=, argv=0x7ffccde4c648) at server.c:3885
(gdb) bt full
0 0x00007f015ab41495 in raise () from /lib64/libc.so.6
No symbol table info available.
1 0x00007f015ab42c75 in abort () from /lib64/libc.so.6
No symbol table info available.
2 0x00007f015ab7f3a7 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
3 0x00007f015ab84dee in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
4 0x00007f015ab87c80 in _int_free () from /lib64/libc.so.6
No symbol table info available.
5 0x00007f0151ffa3f5 in tdigestCompress (t=0x7f0130086750) at src/tdigest.c:176
6 0x00007f0151ff953c in TDigestTypeAdd_RedisCommand (ctx=0x7ffccde4c350, argv=, argc=) at src/command.c:110
7 0x0000000000490c90 in RedisModuleCommandDispatcher (c=0x7f0150a2be40) at module.c:466
8 0x0000000000429337 in call (c=0x7f0150a2be40, flags=15) at server.c:2224
9 0x00000000004299a5 in processCommand (c=0x7f0150a2be40) at server.c:2505
No locals.
10 0x0000000000439b2d in processInputBuffer (c=0x7f0150a2be40) at networking.c:1330
No locals.
11 0x0000000000424aed in aeProcessEvents (eventLoop=0x7f015463b0a0, flags=11) at ae.c:421
12 0x0000000000424e0b in aeMain (eventLoop=0x7f015463b0a0) at ae.c:464
No locals.
13 0x000000000042da22 in main (argc=, argv=0x7ffccde4c648) at server.c:3885