iqiyi / dpvs

DPVS is a high performance Layer-4 load balancer based on DPDK.
Other
3k stars 723 forks source link

keepalived terminated with signal 6, Aborted #119

Open icymoon opened 6 years ago

icymoon commented 6 years ago

two keepalived conf files, only config item alpha is different. reload again and again, with usleep 300

# gdb ./keepalived core.90055 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /home/xxxx/dpvs/bin/keepalived...done. [New LWP 90055] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `./keepalived -d -f /etc/keepalived/keepalived.conf -S 6'. Program terminated with signal 6, Aborted.

0 0x00007f22b24701f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt

0 0x00007f22b24701f7 in raise () from /lib64/libc.so.6

1 0x00007f22b24718e8 in abort () from /lib64/libc.so.6

2 0x00007f22b2469266 in __assert_fail_base () from /lib64/libc.so.6

3 0x00007f22b2469312 in __assert_fail () from /lib64/libc.so.6

4 0x000000000041eede in keepalived_malloc (size=size@entry=24, file=file@entry=0x432731 "list.c", function=function@entry=0x4327bc <__FUNCTION__.3195> "alloc_element",

line=line@entry=36) at memory.c:119

5 0x0000000000422c15 in alloc_element () at list.c:36

6 list_add (l=0x2466910, data=0x2467e30) at list.c:43

7 0x000000000041d769 in process_stream (keywords_vec=) at parser.c:437

8 0x000000000041d77e in process_stream (keywords_vec=) at parser.c:441

9 0x000000000041d77e in process_stream (keywords_vec=) at parser.c:441

10 0x000000000041d85c in read_conf_file (conf_file=conf_file@entry=0x7ffdc4b83777 "/etc/keepalived/keepalived.conf") at parser.c:226

11 0x000000000041df99 in init_data (conf_file=0x7ffdc4b83777 "/etc/keepalived/keepalived.conf", init_keywords=0x40c450 ) at parser.c:472

12 0x00000000004056ee in start_check () at check_daemon.c:110

13 0x0000000000405889 in reload_check_thread (thread=) at check_daemon.c:212

14 0x000000000042136d in thread_call (thread=0x7ffdc4b81d00) at scheduler.c:759

15 launch_scheduler () at scheduler.c:782

16 0x00000000004058cd in start_check_child () at check_daemon.c:311

17 0x0000000000403127 in start_keepalived () at main.c:80

18 main (argc=, argv=) at main.c:303

mscbg commented 6 years ago

That's an assert fail. I believe you have opened debug mode. In that mode, keepalived has set MAX_ALLOC_LIST==2048, that means you can only reload 2048 times at most. If you still want to use debug mode, you can edit MAX_ALLOC_LIST in lib/memory.h more bigger or you can stop using debug mode. If you want more infomation, you can read code in lib/memory.c.

icymoon commented 6 years ago

I didn't reload that many times I think... I reload it about 113 times...

mscbg commented 6 years ago

Sorry , I have a wrong description, this may mislead you. MAX_ALLOC_LIST doesn't mean reload times. I just find this issues in keepalived(https://github.com/acassen/keepalived/issues/390 ). So why not try it? And if you read code in lib/memory.c, you will find what I have said. This file is begin with "ifdef --debug"

icymoon commented 6 years ago

Yes, I enabled debug to reproduce the last problem. I read that issue(acassen/keepalived#390) and I think I can disable vrrp process to have I try, is that ok? I didn't need vrrp online. Thank you very much, @mscbg :)

mscbg commented 6 years ago

You can have a try, but I don't think it will work. keepalived_malloc()/REALLOC() is not only used in vrrp process.

icymoon commented 6 years ago

I disabled debug. Configure and reload keepalived with two different conf files there are only real servers different. It is terminated with signal 6 about 10 times over a night. Keepalived is started with -C: ./keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6

Program terminated with signal 6, Aborted.

0 0x00007f0295d981f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt

0 0x00007f0295d981f7 in raise () from /lib64/libc.so.6

1 0x00007f0295d998e8 in abort () from /lib64/libc.so.6

2 0x00007f0295dd7f47 in __libc_message () from /lib64/libc.so.6

3 0x00007f0295dddb54 in malloc_printerr () from /lib64/libc.so.6

4 0x00007f0295ddf7aa in _int_free () from /lib64/libc.so.6

5 0x0000000000405615 in ?? ()

6 0x000000000041facd in ?? ()

7 0x000000000040575b in ?? ()

8 0x00000000004030a1 in ?? ()

9 0x00007f0295d84c05 in __libc_start_main () from /lib64/libc.so.6

10 0x000000000040315a in ?? ()

(gdb)

or this: Core was generated by `./keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6'. Program terminated with signal 11, Segmentation fault.

0 0x00007f0295ddf30b in _int_free () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt

0 0x00007f0295ddf30b in _int_free () from /lib64/libc.so.6

1 0x0000000000405615 in ?? ()

2 0x000000000041facd in ?? ()

3 0x000000000040575b in ?? ()

4 0x0000000000405856 in ?? ()

5 0x000000000041facd in ?? ()

6 0x00000000004030b9 in ?? ()

7 0x00007f0295d84c05 in __libc_start_main () from /lib64/libc.so.6

8 0x000000000040315a in ?? ()

(gdb)

without strip, it should be this stack I think.. Program terminated with signal 6, Aborted.

0 0x00007f9862d5c1f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt

0 0x00007f9862d5c1f7 in raise () from /lib64/libc.so.6

1 0x00007f9862d5d8e8 in abort () from /lib64/libc.so.6

2 0x00007f9862d9bf47 in __libc_message () from /lib64/libc.so.6

3 0x00007f9862da1b54 in malloc_printerr () from /lib64/libc.so.6

4 0x00007f9862da37aa in _int_free () from /lib64/libc.so.6

5 0x0000000000405615 in reload_check_thread (thread=) at check_daemon.c:194

6 0x000000000041fafd in thread_call (thread=0x7ffeecadf8f0) at scheduler.c:761

7 launch_scheduler () at scheduler.c:784

8 0x000000000040575b in start_check_child () at check_daemon.c:311

9 0x00000000004030a1 in start_keepalived () at main.c:80

10 main (argc=, argv=) at main.c:303

mscbg commented 6 years ago

So,you run keepalived many times using cmd './keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6'?. Did you see any log in /var/log/message like "198121 Jan 31 13:01:21 10 Keepalived[28307]: Healthcheck child process(951) died: Respawning"

mscbg commented 6 years ago

And can you attach a configuration file? I will try to reproduce it. We have run keepalived online for a long time. And it seems works well. I believe if configured well, it will work stably. Anyway ,keepalived may really has many bugs, I saw so many crash inssues in github of keepalived.

mscbg commented 6 years ago

seems related https://github.com/iqiyi/dpvs/issues/126