signalwire / freeswitch

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
https://freeswitch.com/#getting-started
Other
3.35k stars 1.38k forks source link

Crash in mod_sofia #2417

Open zooptwopointone opened 3 months ago

zooptwopointone commented 3 months ago

Describe the bug We have been having a problem with Freeswitch above 1.10.7 crashing every once in a while. Generally within 2 weeks. The older version does not have this issue. The crash always shows that it happens in mod_sofia. I have about 5 servers running and all will eventually hit this problem.

To Reproduce

This is not something I can manually reproduce. it will just occur sometimes. and seems to only happen under load. We use mostly bridging calls, mod_lua, xml_curl for dialplan.

Expected behavior no crashing

Package version or git hash

1.10.11-release+git~20231222T180831Z~f24064f7c9~64bit (git f24064f 2023-12-22 18:08:31Z 64bit)

Running on Debian 12

Trace logs No useful logs that I can find.

backtrace from core file I have a full backtrace of this but it is 20MB. I also didn't want to post the full details here as it does have phone numbers in the dump.

`Core was generated by /usr/local/freeswitch/bin/freeswitch -u freeswitch -g freeswitch -ncwait -nonat. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00007f1b581c89bc in sofia_outgoing_channel (session=0x0, var_event=0x7f186410e6b0, outbound_profile=0x7f18640c6200, new_session=0x7f18458d8410, pool=, flags=, cancel_cause=0x0)

at mod_sofia.c:4970

4970 memcpy(&sa.sin_addr, he->h_addr, sizeof(struct in_addr));`

Please let me know How I can provide the full information more securely

zooptwopointone commented 1 month ago

I found that this only hits this code if you have the sip_gethostbyname option enabled. Sometimes it was crashing inside the small chunk of code dealing with the results in Mem_cpy.

I have since disabled the use of this as it is not longer needed by me so it has resolved the crashes. Though something is still going on there.

System's had two difference setups for this. 1 was no DNS cache hitting the DNS servers for every request. and Other systems using nscd or dns-masq. All of them had the same issue.

This is informational for anyone who hits this same issue.

andywolk commented 11 hours ago

We need a backtrace to analyze. One line from the backtrace is not enough.