signalwire / freeswitch

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
https://freeswitch.com/#getting-started
Other
3.49k stars 1.4k forks source link

FreeSwitch crashes ,Program terminated with signal SIGFPE, Arithmetic exception #1659

Open dxhgq-github opened 2 years ago

dxhgq-github commented 2 years ago

Hello,

FreeSwitch offen crashs every one or two days ,and the problem has lasted for about 2 months, the OS is centos 7.4,the FS version is 1.10.7 , the sofia-sip is latest version 1.13.7 ,and rebuilt last week,proc architecture is : 12 Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz。

the call flow is a simple inbound call to IVR , and then to callcenter according to digit dialed , no any call flows are involving localhost in setup , no local resolver used.

file /etc/resolv.conf is empty.

file /etc/hosts : 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

the error messages in /var/log/messages is : May 4 20:34:52 localhost kernel: traps: freeswitch[92647] trap divide error ip:7f2c9e58381a sp:7f2c7d58c070 error:0 in libsofia-sip-ua.so.0.6.0[7f2c9e457000+1ca000] May 4 20:34:54 localhost freeswitch: #33[m#033[32m2022-05-04 20:34:34.034071 98.97% [INFO] switch_cpp.cpp:1465 From Lua callcnt_say_agent_num.lua: caller_uuid: f May 4 20:34:54 localhost systemd: freeswitch.service: main process exited, code=killed, status=8/FPE May 4 20:34:54 localhost systemd: Unit freeswitch.service entered failed state. May 4 20:34:54 localhost systemd: freeswitch.service failed.

and the gdb bt info is : Core was generated by `/usr/local/fs1.10.7/bin/freeswitch -nonat'. Program terminated with signal SIGFPE, Arithmetic exception.

0 0x00007f2c9e58381a in su_block_add (b=0x7f2c542282f0, p=0x7f2c542547f0) at su_alloc.c:342

342 h = (size_t)((uintptr_t)p % b->sub_n); [Current thread is 1 (LWP 92647)] (gdb) (gdb) bt

0 0x00007f2c9e58381a in su_block_add (b=0x7f2c542282f0, p=0x7f2c542547f0) at su_alloc.c:342

1 0x00007f2c9e583dcf in sub_alloc (home=0x7f2c54250ef0, sub=0x7f2c542282f0, size=80,

zero=do_calloc) at su_alloc.c:530

2 0x00007f2c9e5858e3 in su_zalloc (home=0x7f2c54250ef0, size=80) at su_alloc.c:1577

3 0x00007f2c9e50dd4b in outgoing_make_a_aaaa_query (orq=0x7f2c54254690) at nta.c:10463

4 0x00007f2c9e50d350 in outgoing_resolve_next (orq=0x7f2c54254690) at nta.c:10244

5 0x00007f2c9e50ee88 in outgoing_answer_srv (orq=0x7f2c54254690, q=0x7f2c5410b2c0,

answers=0x7f2c541167b0) at nta.c:10830

6 0x00007f2c9e5782c8 in sres_query_report_error (q=0x7f2c5410b2c0, answers=0x7f2c541167b0)

at sres.c:2991

7 0x00007f2c9e578675 in sres_resend_dns_query (res=0x7f2c54002990, q=0x7f2c5410b2c0,

timeout=0) at sres.c:3091

8 0x00007f2c9e5795c2 in sres_resolver_report_error (res=0x7f2c54002990, socket=590,

errcode=111, remote=0x7f2c7d58c560, remotelen=16, info=0x7f2c7d58c4e0 "icmp type=3 code=3 reported by 127.0.0.1") at sres.c:3440

9 0x00007f2c9e579281 in sres_resolver_error (res=0x7f2c54002990, socket=590) at sres.c:3354

10 0x00007f2c9e57ef20 in sres_sofia_poll (magic=0x7f2c68001180, w=0x7f2c54001084,

reg=0x7f2c540037c0) at sresolv.c:357

11 0x00007f2c9e59079a in su_epoll_port_wait_events (self=0x7f2c540008c0, tout=690)

at su_epoll_port.c:510

12 0x00007f2c9e58cb9a in su_base_port_run (self=0x7f2c540008c0) at su_base_port.c:349

13 0x00007f2c9e588e4a in su_port_run (self=0x7f2c540008c0) at su_port.h:326

14 0x00007f2c9e589f24 in su_root_run (self=0x7f2c54001130) at su_root.c:819

15 0x00007f2c9e58d956 in su_pthread_port_clone_main (varg=0x7f2c91233460)

at su_pthread_port.c:343

16 0x00007f2c9d5b9e65 in start_thread () from /lib64/libpthread.so.0

17 0x00007f2c9cc0e88d in __libc_ifunc_impl_list () from /lib64/libc.so.6

18 0x0000000000000000 in ?? ()

(gdb)

would you please tell me how to capture the messages , or messages which are involving localhost in setup , and how to check whether local resolver is set.

please help ,thank you! backtrace1107.log

dragos-oancea commented 2 years ago

can you try this PR ? https://github.com/freeswitch/sofia-sip/pull/130 . you must rebuild and reinstall sofia-sip. then rebuild FS. the bug looks related to the code resolving ipv6 / AAAA

dxhgq-github commented 2 years ago

thank you ,I try it .

dxhgq-github commented 2 years ago

hello

I have downloaded and rebuilt the sofia-sip and freeswitch from freeswitch/sofia-sip#130, but FS still crashed after 18 hours running ,error message is chaned from FPE to SIGABRT.

Core was generated by `/usr/local/fs1.10.7/bin/freeswitch -nonat'. Program terminated with signal SIGABRT, Aborted.

0 0x00007f9a3713a337 in ssignal () from /lib64/libc.so.6

[Current thread is 1 (LWP 152931)] (gdb) (gdb) (gdb) (gdb) bt

0 0x00007f9a3713a337 in ssignal () from /lib64/libc.so.6

1 0x00007f9a3713ba28 in abort () from /lib64/libc.so.6

2 0x00007f9a3717ce87 in __libc_message () from /lib64/libc.so.6

3 0x00007f9a37185679 in _int_free () from /lib64/libc.so.6

4 0x00007f9a38b77c06 in sub_alloc (home=0x7f99ec42fa40, sub=0x7f99ec34e6e0, size=80,

zero=do_calloc) at su_alloc.c:478

5 0x00007f9a38b798e3 in su_zalloc (home=0x7f99ec42fa40, size=80) at su_alloc.c:1577

6 0x00007f9a38b01d4b in outgoing_make_a_aaaa_query (orq=0x7f99ec430dc0) at nta.c:10463

7 0x00007f9a38b01350 in outgoing_resolve_next (orq=0x7f99ec430dc0) at nta.c:10244

8 0x00007f9a38b02e88 in outgoing_answer_srv (orq=0x7f99ec430dc0, q=0x7f99ec430fe0,

answers=0x7f99ec356780) at nta.c:10830

9 0x00007f9a38b6c2c8 in sres_query_report_error (q=0x7f99ec430fe0, answers=0x7f99ec356780)

at sres.c:2991

10 0x00007f9a38b6c675 in sres_resend_dns_query (res=0x7f99ec002990, q=0x7f99ec430fe0,

timeout=0) at sres.c:3091

11 0x00007f9a38b6d5c2 in sres_resolver_report_error (res=0x7f99ec002990, socket=229,

errcode=111, remote=0x7f9a22062560, remotelen=16, 
info=0x7f9a220624e0 "icmp type=3 code=3 reported by 127.0.0.1") at sres.c:3440

12 0x00007f9a38b6d281 in sres_resolver_error (res=0x7f99ec002990, socket=229) at sres.c:3354

13 0x00007f9a38b72f20 in sres_sofia_poll (magic=0x7f9a00001180, w=0x7f99ec20d494,

reg=0x7f99ec0037c0) at sresolv.c:357

14 0x00007f9a38b8479a in su_epoll_port_wait_events (self=0x7f99ec0008c0, tout=695)

at su_epoll_port.c:510

15 0x00007f9a38b80b9a in su_base_port_run (self=0x7f99ec0008c0) at su_base_port.c:349

16 0x00007f9a38b7ce4a in su_port_run (self=0x7f99ec0008c0) at su_port.h:326

17 0x00007f9a38b7df24 in su_root_run (self=0x7f99ec001130) at su_root.c:819

18 0x00007f9a38b81956 in su_pthread_port_clone_main (varg=0x7f9a237d6450)

at su_pthread_port.c:343

19 0x00007f9a37bade65 in start_thread () from /lib64/libpthread.so.0

20 0x00007f9a3720288d in __libc_ifunc_impl_list () from /lib64/libc.so.6

21 0x0000000000000000 in ?? ()

(gdb)

please help,thank you! backtrace20220521.log

dragos-oancea commented 2 years ago

I pushed another commit and rebased the branch, so you'll get more fixes. please test.