Open icymoon opened 6 years ago
That's an assert fail. I believe you have opened debug mode. In that mode, keepalived has set MAX_ALLOC_LIST==2048, that means you can only reload 2048 times at most. If you still want to use debug mode, you can edit MAX_ALLOC_LIST in lib/memory.h more bigger or you can stop using debug mode. If you want more infomation, you can read code in lib/memory.c.
I didn't reload that many times I think... I reload it about 113 times...
Sorry , I have a wrong description, this may mislead you. MAX_ALLOC_LIST doesn't mean reload times. I just find this issues in keepalived(https://github.com/acassen/keepalived/issues/390 ). So why not try it? And if you read code in lib/memory.c, you will find what I have said. This file is begin with "ifdef --debug"
Yes, I enabled debug to reproduce the last problem. I read that issue(acassen/keepalived#390) and I think I can disable vrrp process to have I try, is that ok? I didn't need vrrp online. Thank you very much, @mscbg :)
You can have a try, but I don't think it will work. keepalived_malloc()/REALLOC() is not only used in vrrp process.
I disabled debug. Configure and reload keepalived with two different conf files there are only real servers different. It is terminated with signal 6 about 10 times over a night. Keepalived is started with -C: ./keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6
Program terminated with signal 6, Aborted.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt
(gdb)
or this: Core was generated by `./keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6'. Program terminated with signal 11, Segmentation fault.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt
(gdb)
without strip, it should be this stack I think.. Program terminated with signal 6, Aborted.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt
So,you run keepalived many times using cmd './keepalived -d -C -f /etc/keepalived/keepalived.conf -S 6'?. Did you see any log in /var/log/message like "198121 Jan 31 13:01:21 10 Keepalived[28307]: Healthcheck child process(951) died: Respawning"
And can you attach a configuration file? I will try to reproduce it. We have run keepalived online for a long time. And it seems works well. I believe if configured well, it will work stably. Anyway ,keepalived may really has many bugs, I saw so many crash inssues in github of keepalived.
seems related https://github.com/iqiyi/dpvs/issues/126
two keepalived conf files, only config item alpha is different. reload again and again, with usleep 300
# gdb ./keepalived core.90055 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /home/xxxx/dpvs/bin/keepalived...done. [New LWP 90055] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `./keepalived -d -f /etc/keepalived/keepalived.conf -S 6'. Program terminated with signal 6, Aborted.
0 0x00007f22b24701f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt
0 0x00007f22b24701f7 in raise () from /lib64/libc.so.6
1 0x00007f22b24718e8 in abort () from /lib64/libc.so.6
2 0x00007f22b2469266 in __assert_fail_base () from /lib64/libc.so.6
3 0x00007f22b2469312 in __assert_fail () from /lib64/libc.so.6
4 0x000000000041eede in keepalived_malloc (size=size@entry=24, file=file@entry=0x432731 "list.c", function=function@entry=0x4327bc <__FUNCTION__.3195> "alloc_element",
5 0x0000000000422c15 in alloc_element () at list.c:36
6 list_add (l=0x2466910, data=0x2467e30) at list.c:43
7 0x000000000041d769 in process_stream (keywords_vec=) at parser.c:437
8 0x000000000041d77e in process_stream (keywords_vec=) at parser.c:441
9 0x000000000041d77e in process_stream (keywords_vec=) at parser.c:441
10 0x000000000041d85c in read_conf_file (conf_file=conf_file@entry=0x7ffdc4b83777 "/etc/keepalived/keepalived.conf") at parser.c:226
11 0x000000000041df99 in init_data (conf_file=0x7ffdc4b83777 "/etc/keepalived/keepalived.conf", init_keywords=0x40c450) at parser.c:472
12 0x00000000004056ee in start_check () at check_daemon.c:110
13 0x0000000000405889 in reload_check_thread (thread=) at check_daemon.c:212
14 0x000000000042136d in thread_call (thread=0x7ffdc4b81d00) at scheduler.c:759
15 launch_scheduler () at scheduler.c:782
16 0x00000000004058cd in start_check_child () at check_daemon.c:311
17 0x0000000000403127 in start_keepalived () at main.c:80
18 main (argc=, argv=) at main.c:303