signalwire / freeswitch

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
https://freeswitch.com/#getting-started
Other
3.35k stars 1.38k forks source link

Freeswitch leaves stale calls #2372

Open greenbea opened 5 months ago

greenbea commented 5 months ago

Describe the bug Freeswitch leaves a lot of stale calls in memory and in db. In the past, we experienced this seldomly, and it was only stale calls in the database, but not in memory. However, the same day we started to use the latest freeswitch version 1.10.11, we saw a lot of stale calls, and uuid_exists returned true on those calls, indicating the calls were still in memory. So this is more of a serious bug.

I've seen this very same issue reported on the slack channel, that after an update to the latest freeswitch there's an increase in stale calls.

Package version or git hash

greenbea commented 5 months ago

I took a core dump and found deadlocks causing the calls to get stuck. The gdb output is just from 1 call, but all the others look very alike.

The gdb output is from the following commands.

thread 78
bt
frame 3
p *mutex
thread 79
bt
frame 3
p *mutex
greenbea commented 5 months ago

It appears this is the same issue as https://github.com/signalwire/freeswitch/issues/2290

technophreak commented 5 months ago

I have the same issue with 1.10.11, I will try with 1.10.9 and 1.10.10 and report back with findings.

wesmu commented 5 months ago

Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards

bferreirq commented 5 months ago

Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.

After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli

Same issue for you ?

image

Regards,

wesmu commented 5 months ago

Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.

After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli

Same issue for you ?

image

Regards,

Take a look on CONTACT header sent in the INVITE, it seems the other end is sending all the answers to one address is not where the FS is listening on (assuming FS sent the second INVITE...)

bferreirq commented 5 months ago

Yes, everything is good on contact side, it's totally random over several thousand calls per day

technophreak commented 5 months ago

I am using Debian 12. I have not noticed any issue with 200 OK. I will take a closer look and report if I find anything on that regard.

greenbea commented 5 months ago

Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards

The issue isn't related to transcoding. See issue https://github.com/signalwire/freeswitch/issues/2290 for the details of this deadlock.

bferreirq commented 5 months ago

@technophreak I tried to reproduce in different ways with a SIPp in all directions and I not reproduce.

But randomly it happens and after few days we have like 100 fake active calls on freeswitch (fs_cli -x show calls) due to this problem (when we check 200OK, BYE is ignored)

Of course, Contact, Via and other fields seems good.

For me happening after upgrade to Deb12 and 1.10.10. I’ve upgrade to 1.10.11 in order to test but its not better.

If i can help tell me, thanks a lot.

technophreak commented 5 months ago

@bferreirq Yeah, in my case those stale calls are not even properly killed with uuid_kill, there are remnants. I suspect it is not directly related, perhaps your OK issue is just a symptom for the same issue.

greenbea commented 5 months ago

You cannot kill those calls because they're mutex locked. The only way to get rid of them is to restart freeswitch.

bferreirq commented 5 months ago

Yes we already schedule restart of freeswitch to kill these calls.

technophreak commented 5 months ago

@bferreirq The issue I have is that those have remants (still appears in show calls) after they get killed and I only restart when there are no more calls in progress.

Although rare, we do have very long calls, so I can't just assume that a 3 hour call is a ghost call and I have no indication if this call is still really legitimately connected.

This is a very annoying issue.

technophreak commented 5 months ago

@greenbea Thanks for that info.

Have you found a way to know (via uuid_dump for example) if they are still really connected ?

technophreak commented 5 months ago

In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.

I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.

I will go ahead and test version 1.10.10 now and report back.

bferreirq commented 5 months ago

In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.

I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.

I will go ahead and test version 1.10.10 now and report back.

could you please share your sip scenario ?

technophreak commented 5 months ago

@bferreirq The calls that seems to trigger most often this issue are as follow:

technophreak commented 5 months ago

I can confirm I am also getting stale/stuck calls with v1.10.10

As far as I am concenred, here is my diagnostic:

--

To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with rxfax for one of my carrier.

phamhieptel4vn commented 3 months ago

I can confirm I am also getting stale/stuck calls with v1.10.10

As far as I am concenred, here is my diagnostic:

  • v1.10.8 : Not affected
  • v1.10.9 : Not affected
  • v1.10.10 : Affected
  • v1.10.11 : Affected

--

To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with rxfax for one of my carrier.

I have the same problem in 1.10.10-release. Although the actual call has ended, the channel inside Freeswitch has not been hung up, so the CDR has not been updated. image

shaunjstokes commented 3 months ago

We had the same issue, you should check the following patch. https://github.com/signalwire/freeswitch/pull/2300

phamhieptel4vn commented 3 months ago

We had the same issue, you should check the following patch. #2300

Could you tell me if this PR has resolved the issue?